<file1-1.txt>
    The Atlantic Best of The Atlantic publisher


   Technology

                      The Shallowness of Google Translate

   The program uses state-of-the-art AI techniques, but simple tests show
   that it's a long way from real understanding.


    Douglas Hofstadter

   Jan 30, 2018

   Hands hold a smartphone in front of a sign saying "Bienvenue" and the
   smartphone reads "Welcome" Nazar Abbas Photography / Getty

   One Sunday, at one of our weekly salsa sessions, my friend Frank
   brought along a Danish guest. I knew Frank spoke Danish well, since his
   mother was Danish, and he, as a child, had lived in Denmark. As for his
   friend, her English was fluent, as is standard for Scandinavians.
   However, to my surprise, during the evening’s chitchat it emerged that
   the two friends habitually exchanged emails using Google Translate.
   Frank would write a message in English, then run it through Google
   Translate to produce a new text in Danish; conversely, she would write
   a message in Danish, then let Google Translate anglicize it. How odd!
   Why would two intelligent people, each of whom spoke the other’s
   language well, do this? My own experiences with machine-translation
   software had always led me to be highly skeptical about it. But my
   skepticism was clearly not shared by these two. Indeed, many thoughtful
   people are quite enamored of translation programs, finding little to
   criticize in them. This baffles me.

   As a language lover and an impassioned translator, as a cognitive
   scientist and a lifelong admirer of the human mind’s subtlety, I have
   followed the attempts to mechanize translation for decades. When I
   first got interested in the subject, in the mid-1970s, I ran across a
   letter written in 1947 by the mathematician Warren Weaver, an early
   machine-translation advocate, to Norbert Wiener, a key figure in
   cybernetics, in which Weaver made this curious claim, today quite
   famous:

     When I look at an article in Russian, I say, “This is really written
     in English, but it has been coded in some strange symbols. I will
     now proceed to decode.”

   Some years later he offered a different viewpoint: “No reasonable
   person thinks that a machine translation can ever achieve elegance and
   style. Pushkin need not shudder.” Whew! Having devoted one
   unforgettably intense year of my life to translating Alexander
   Pushkin’s sparkling novel in verse Eugene Onegin into my native tongue
   (that is, having radically reworked that great Russian work into an
   English-language novel in verse), I find this remark of Weaver’s far
   more congenial than his earlier remark, which reveals a strangely
   simplistic view of language. Nonetheless, his 1947 view of
   translation-as-decoding became a credo that has long driven the field
   of machine translation.

   Since those days, “translation engines” have gradually improved, and
   recently the use of so-called “deep neural nets” has even suggested to
   some observers (see “The Great AI Awakening” by Gideon Lewis-Kraus in
   The New York Times Magazine, and “Machine Translation: Beyond Babel” by
   Lane Greene in The Economist) that human translators may be an
   endangered species. In this scenario, human translators would become,
   within a few years, mere quality controllers and glitch fixers, rather
   than producers of fresh new text.

   Such a development would cause a soul-shattering upheaval in my mental
   life. Although I fully understand the fascination of trying to get
   machines to translate well, I am not in the least eager to see human
   translators replaced by inanimate machines. Indeed, the idea frightens
   and revolts me. To my mind, translation is an incredibly subtle art
   that draws constantly on one’s many years of experience in life, and on
   one’s creative imagination. If, some “fine” day, human translators were
   to become relics of the past, my respect for the human mind would be
   profoundly shaken, and the shock would leave me reeling with terrible
   confusion and immense, permanent sadness.

   Each time I read an article claiming that the guild of human
   translators will soon be forced to bow down before the terrible swift
   sword of some new technology, I feel the need to check the claims out
   myself, partly out of a sense of terror that this nightmare just might
   be around the corner, more hopefully out of a desire to reassure myself
   that it’s not just around the corner, and finally, out of my
   longstanding belief that it’s important to combat exaggerated claims
   about artificial intelligence. And so, after reading about how the old
   idea of artificial neural networks, recently adopted by a branch of
   Google called Google Brain, and now enhanced by “deep learning,” has
   resulted in a new kind of software that has allegedly revolutionized
   machine translation, I decided I had to check out the latest
   incarnation of Google Translate. Was it a game changer, as Deep Blue
   and AlphaGo were for the venerable games of chess and Go?

   I learned that although the older version of Google Translate can
   handle a very large repertoire of languages, its new deep-learning
   incarnation at the time worked for just nine languages. (It’s now
   expanded to 96.) Accordingly, I limited my explorations to English,
   French, German, and Chinese.

   Before showing my findings, though, I should point out that an
   ambiguity in the adjective “deep” is being exploited here. When one
   hears that Google bought a company called DeepMind whose products have
   “deep neural networks” enhanced by “deep learning,” one cannot 
   taking the word “deep” to mean “profound,” and thus “powerful,”
   “insightful,” “wise.” And yet, the meaning of “deep” in this context
   comes simply from the fact that these neural networks have more layers
   (12, say) than do older networks, which might have only two or three.
   But does that sort of depth imply that whatever such a network does
   must be profound? Hardly. This is verbal spinmeistery.

   I am very wary of Google Translate, especially given all the hype
   surrounding it. But despite my distaste, I recognize some astonishing
   facts about this bête noire of mine. It is accessible for free to
   anyone on earth, and will convert text in any of roughly 100 languages
   into text in any of the others. That is humbling. If I am proud to call
   myself “pi-lingual” (meaning the sum of all my fractional languages is
   a bit over 3, which is my lighthearted way of answering the question
   “How many languages do you speak?”), then how much prouder should
   Google Translate be, since it could call itself “bai-lingual” (“bai”
   being Mandarin for 100). To a mere pilingual, bailingualism is most
   impressive. Moreover, if I copy and paste a page of text in Language A
   into Google Translate, only moments will elapse before I get back a
   page filled with words in Language B. And this is happening all the
   time on screens all over the planet, in dozens of languages.

   The practical utility of Google Translate and similar technologies is
   undeniable, and probably it’s a good thing overall, but there is still
   something deeply lacking in the approach, which is conveyed by a single
   word: understanding. Machine translation has never focused on
   understanding language. Instead, the field has always tried to
   “decode”—to get away without worrying about what understanding and
   meaning are. Could it in fact be that understanding isn’t needed in
   order to translate well? Could an entity, human or machine, do
   high-quality translation without paying attention to what language is
   all about? To shed some light on this question, I turn now to the
   experiments I made.

     
   I began my explorations very humbly, using the following short remark,
   which, in a human mind, evokes a clear scenario:

     In their house, everything comes in pairs. There’s his car and her
     car, his towels and her towels, and his library and hers.

   The translation challenge seems straightforward, but in French (and
   other Romance languages), the words for “his” and “her” don’t agree in
   gender with the possessor, but with the item possessed. So here’s what
   Google Translate gave me:

     Dans leur maison, tout vient en paires. Il y a sa voituresa
     voiture, ses serviettesses serviettes, sa bibliothèqueles
     siennes.

   The program fell into my trap, not realizing, as any human reader
   would, that I was describing a couple, stressing that for each item he
   had, she had a similar one. For example, the deep-learning engine used
   the word “sa” for both “his car” and “her car,” so you can’t tell
   anything about either car-owner’s gender. Likewise, it used the
   genderless plural “ses” both for “his towels” and “her towels,” and in
   the last case of the two libraries, his and hers, it got thrown by the
   final “s” in “hers” and somehow decided that that “s” represented a
   plural (“les siennes”). Google Translate’s French sentence missed the
   whole point.

   Next I translated the challenge phrase into French myself, in a way
   that did preserve the intended meaning. Here’s my French version:

     Chez eux, ils ont tout en double. Il y a sa voitureellesa
     voiturelui, ses servietteselleses servietteslui, sa
     bibliothèqueellesa bibliothèquelui.

   The phrase “sa voitureelle” spells out the idea “her car,” and
   similarly, “sa voiturelui” can only be heard as meaning “his car.”
   At this point, I figured it would be trivial for Google Translate to
   carry my French translation back into English and get the English right
   on the money, but I was dead wrong. Here’s what it gave me:

     At home, they have everything in double. There is his own car and
     his own car, his own towels and his own towels, his own library and
     his own library.

   What?! Even with the input sentence screaming out the owners’ genders
   as loudly as possible, the translating machine ignored the screams and
   made everything masculine. Why did it throw the sentence’s most crucial
   information away?

   We humans know all sorts of things about couples, houses, personal
   possessions, pride, rivalry, jealousy, privacy, and many other
   intangibles that lead to such quirks as a married couple having towels
   embroidered “his” and “hers.” Google Translate isn’t familiar with such
   situations. Google Translate isn’t familiar with situations, period.
   It’s familiar solely with strings composed of words composed of
   letters. It’s all about ultrarapid processing of pieces of text, not
   about thinking or imagining or remembering or understanding. It doesn’t
   even know that words stand for things. Let me hasten to say that a
   computer program certainly could, in principle, know what language is
   for, and could have ideas and memories and experiences, and could put
   them to use, but that’s not what Google Translate was designed to do.
   Such an ambition wasn’t even on its designers’ radar screens.

   Well, I chuckled at these poor shows, relieved to see that we aren’t,
   after all, so close to replacing human translators by automata. But I
   still felt I should check the engine out more closely. After all, one
   swallow does not thirst quench.

   Indeed, what about this freshly coined phrase “One swallow does not
   thirst quench” (alluding, of course, to “One swallow does not a summer
   make”)? I couldn’t resist trying it out; here’s what Google Translate
   flipped back at me: “Une hirondelle n’aspire pas la soif.” This is a
   grammatical French sentence, but it’s pretty hard to fathom. First it
   names a certain bird (“une hirondelle”—a swallow), then it says this
   bird is not inhaling or not sucking (“n’aspire pas”), and finally
   reveals that the neither-inhaled-nor-sucked item is thirst (“la soif”).
   Clearly Google Translate didn’t catch my meaning; it merely came out
   with a heap of bull. “Il sortait simplement avectas de taureau.”
   “He just went out with a pile of bulls.” “Il vient de sortir avec un
   tas de taureaux.” Please pardon my French—or rather, Google Translate’s
   pseudo-French.

     
   From the frying pan of French, let’s jump into the fire of German. Of
   late I’ve been engrossed in the book Sie nannten sich der Wiener Kreis
   (They Called Themselves the Vienna Circle), by the Austrian
   mathematician Karl Sigmund. It describes a group of idealistic Viennese
   intellectuals in the 1920s and 1930s, who had a major impact on
   philosophy and science during the rest of the century. I chose a short
   passage from Sigmund’s book and gave it to Google Translate. Here it
   is, first in German, followed by my own translation, and then Google
   Translate’s version. (By the way, I checked my translation with two
   native speakers of German, including Karl Sigmund, so I think you can
   assume it is accurate.)

   Sigmund:

     Nach dem verlorenen Krieg sahen es viele deutschnationale
     Professoren, inzwischen die Mehrheit in der Fakultät, gewissermaßen
     als ihre Pflicht an, die Hochschulen vor den “Ungeraden” zu
     bewahren; am schutzlosesten waren junge Wissenschaftler vor ihrer
     Habilitation. Und Wissenschaftlerinnen kamen sowieso nicht in frage;
     über wenig war man sich einiger.

   Hofstadter:

     After the defeat, many professors with Pan-Germanistic leanings, who
     by that time constituted the majority of the faculty, considered it
     pretty much their duty to protect the institutions of higher
     learning from “undesirables.” The most likely to be dismissed were
     young scholars who had not yet earned the right to teach university
     classes. As for female scholars, well, they had no place in the
     system at all; nothing was clearer than that.

   Google Translate:

     After the lost war, many German-National professors, meanwhile the
     majority in the faculty, saw themselves as their duty to keep the
     universities from the “odd”; Young scientists were most vulnerable
     before their habilitation. And scientists did not question anyway;
     There were few of them.

   The words in Google Translate’s output are all English words (even if,
   for unclear reasons, a couple are inappropriately capitalized). So far,
   so good! But soon it grows wobbly, and the further down you go the
   wobblier it gets.

   I’ll focus first on “the ‘odd.’” This corresponds to the German “die
   ‘Ungeraden,’” which here means “politically undesirable people.” Google
   Translate, however, had a reason—a very simple statistical reason—for
   choosing the word “odd.” Namely, in its huge bilingual database, the
   word “ungerade” was almost always translated as “odd.” Although the
   engine didn’t realize why this was the case, I can tell you why. It’s
   because “ungerade”—which literally means “un-straight” or
   “uneven”—nearly always means “not divisible by two.” By contrast, my
   choice of “undesirables” to render “Ungeraden” had nothing to do with
   the statistics of words, but came from my understanding of the
   situation—from my zeroing in on a notion not explicitly mentioned in
   the text and certainly not listed as a translation of “ungerade” in any
   of my German dictionaries.

   Let’s move on to the German “Habilitation,” denoting a university
   status resembling tenure. The English cognate word “habilitation”
   exists but it is super-rare, and certainly doesn’t bring to mind tenure
   or anything like it. That’s why I briefly explained the idea rather
   than just quoting the obscure word, since that mechanical gesture would
   not get anything across to anglophonic readers. Of course Google
   Translate would never do anything like this, as it has no model of its
   readers’ knowledge.

   The last two sentences really bring out how crucial understanding is
   for translation. The 15-letter German noun “Wissenschaftler” means
   either “scientist” or “scholar.” (I opted for the latter, as in this
   context it was referring to intellectuals in general. Google Translate
   didn’t get that subtlety.) The related 17-letter noun
   “Wissenschaftlerin,” found in the closing sentence in its plural form
   “Wissenschaftlerinnen,” is a consequence of the gendered-ness of German
   nouns. Whereas the “short” noun is grammatically masculine and thus
   suggests a male scholar, the longer noun is feminine and applies to
   females only. I wrote “female scholar” to get the idea across. Google
   Translate, however, did not understand that the feminizing suffix “-in”
   was the central focus of attention in the final sentence. Since it
   didn’t realize that females were being singled out, the engine merely
   reused the word “scientist,” thus missing the sentence’s entire point.
   As in the earlier French case, Google Translate didn’t have the
   foggiest idea that the sole purpose of the German sentence was to shine
   a spotlight on a contrast between males and females.

   Aside from that blunder, the rest of the final sentence is a disaster.
   Take its first half. Is “scientists did not question anyway” really a
   translation of “Wissenschaftlerinnen kamen sowieso nicht in frage”? It
   doesn’t mean what the original means—it’s not even in the same
   ballpark. It just consists of English words haphazardly triggered by
   the German words. Is that all it takes for a piece of output to deserve
   the label “translation”?

   The sentence’s second half is equally erroneous. The last six German
   words mean, literally, “over little was one more united,” or, more
   flowingly, “there was little about which people were more in
   agreement,” yet Google Translate managed to turn that perfectly clear
   idea into “There were few of them.” We baffled humans might ask “Few of
   what?” but to the mechanical listener, such a question would be
   meaningless. Google Translate doesn’t have ideas behind the scenes, so
   it couldn’t even begin to answer the simple-seeming query. The
   translation engine was not imagining large or small amounts or numbers
   of things. It was just throwing symbols around, without any notion that
   they might symbolize something.

     
   It’s hard for a human, with a lifetime of experience and understanding
   and of using words in a meaningful way, to realize how devoid of
   content all the words thrown onto the screen by Google Translate are.
   It’s almost irresistible for people to presume that a piece of software
   that deals so fluently with words must surely know what they mean. This
   classic illusion associated with artificial-intelligence programs is
   called the “Eliza effect,” since one of the first programs to pull the
   wool over people’s eyes with its seeming understanding of English, back
   in the 1960s, was a vacuous phrase manipulator called Eliza, which
   pretended to be a psychotherapist, and as such, it gave many people who
   interacted with it the eerie sensation that it deeply understood their
   innermost feelings.

   For decades, sophisticated people—even some artificial-intelligence
   researchers—have fallen for the Eliza effect. In order to make sure
   that my readers steer clear of this trap, let me quote some phrases
   from a few paragraphs up—namely, “Google Translate did not understand,”
   “it did not realize,” and “Google Translate didn’t have the foggiest
   idea.” Paradoxically, these phrases, despite harping on the lack of
   understanding, almost suggest that Google Translate might at least
   sometimes be capable of understanding what a word or a phrase or a
   sentence means, or is about. But that isn’t the case. Google Translate
   is all about bypassing or circumventing the act of understanding
   language.

   To me, the word “translation” exudes a mysterious and evocative aura.
   It denotes a profoundly human art form that graciously carries clear
   ideas in Language A into clear ideas in Language B, and the bridging
   act not only should maintain clarity, but also should give a sense for
   the flavor, quirks, and idiosyncrasies of the writing style of the
   original author. Whenever I translate, I first read the original text
   carefully and internalize the ideas as clearly as I can, letting them
   slosh back and forth in my mind. It’s not that the words of the
   original are sloshing back and forth; it’s the ideas that are
   triggering all sorts of related ideas, creating a rich halo of related
   scenarios in my mind. Needless to say, most of this halo is
   unconscious. Only when the halo has been evoked sufficiently in my mind
   do I start to try to express it—to “press it out”—in the second
   language. I try to say in Language B what strikes me as a natural B-ish
   way to talk about the kinds of situations that constitute the halo of
   meaning in question.

   I am not, in short, moving straight from words and phrases in Language
   A to words and phrases in Language B. Instead, I am unconsciously
   conjuring up images, scenes, and ideas, dredging up experiences I
   myself have had (or have read about, or seen in movies, or heard from
   friends), and only when this nonverbal, imagistic, experiential, mental
   “halo” has been realized—only when the elusive bubble of meaning is
   floating in my brain—do I start the process of formulating words and
   phrases in the target language, and then revising, revising, and
   revising. This process, mediated via meaning, may sound sluggish, and
   indeed, in comparison with Google Translate’s two or three seconds per
   page, it certainly is—but it is what any serious human translator does.
   This is the kind of thing I imagine when I hear an evocative phrase
   like “deep mind.”

     
   That said, I turn now to Chinese, a language that gave the
   deep-learning software a far rougher ride than the two European
   languages did. For my test material, I drew from the touching memoir
   Women Sa (We Three), written by the Chinese playwright and translator
   Yang Jiang, who recently died at 104. Her book recounts the intertwined
   lives of herself, her husband Qian Zhongshu (also a novelist and
   translator), and their daughter. It is not written in an especially
   arcane manner, but it uses an educated, lively Chinese. I chose a short
   passage and let Google Translate loose on it. Here are the results,
   along with my own translation (again vetted by native speakers of
   Chinese):

   Yang:

     锺书到清华工作一年后，调任毛选翻译委员会的工作，住在城里，周末回校。 他仍兼管研究生。

     毛选翻译委员会的领导是徐永煐同志。介绍锺书做这份工作的是清华同学乔冠华同志。

     事定之日，晚饭后，有一位旧友特雇黄包车从城里赶来祝贺。客去后，锺书惶恐地对我说：

     他以为我要做“南书房行走”了。这件事不是好做的，不求有功，但求无过。

   Hofstadter:

     After Zhongshu had worked at Tsinghua University for a year, he was
     transferred to the committee that was translating selected works of
     Chairman Mao. He lived in the city, but each weekend he would return
     to school. He also was still supervising his graduate students.

     The leader of the translation committee of Mao’s works was Comrade
     Xu Yongying, and the person who had arranged for Zhongshu to do this
     work was his old Tsinghua schoolmate, Comrade Qiao Guanhua.

     On the day this appointment was decided, after dinner, an old friend
     specially hired a rickshaw and came all the way from the city just
     to congratulate Zhongshu. After our guest had left, Zhongshu turned
     to me uneasily and said:

     “He thought I was going to become a ‘South Study special aide.’ This
     kind of work is not easy. You can’t hope for glory; all you can hope
     for is to do it without errors.”

   Google Translate:

     After a year of work at Tsinghua, he was transferred to the Mao
     Translating Committee to live in the city and back to school on
     weekends. He is still a graduate student.

     The leadership of the Mao Tse Translation Committee is Comrade Xu
     Yongjian. Introduction to the book to do this work is Tsinghua
     students Qiao Guanhua comrades.

     On the day of the event, after dinner, an old friend hired a
     rickshaw from the city to congratulate. Guest to go, the book of
     fear in the book said to me:

     He thought I had to do “South study walking.” This is not a good
     thing to do, not for meritorious service, but for nothing.

   I’ll briefly point out a few oddities. First of all, Google Translate
   never refers to Zhongshu by name, although his name (“锺书”) occurs three
   times in the original. The first time, the engine uses the pronoun
   “he”; the second time around it says “the book”; the third time it says
   “the book of fear in the book.” Go figure!

   A second oddity is that the first paragraph clearly says that Zhongshu
   is supervising graduate students, whereas Google Translate turns him
   into a graduate student.

   A third oddity is that in the phrase “Mao Tse Translation Committee,”
   one third of Chairman Mao Tse Tung’s name fell off the train.

   A fourth oddity is that the name “Yongying” was replaced by “Yongjian.”

   A fifth oddity is that “after our guest had left” was reduced to “guest
   to go.”

   A sixth oddity is that the last sentence makes no sense at all.

   Well, these six oddities are already quite a bit of humble pie for
   Google Translate to swallow, but let’s forgive and forget. Instead,
   I’ll focus in on just one confusing phrase I ran into—a five-character
   phrase in quotation marks in the last paragraph (“南书房行走”). Character
   for character, it might be rendered as “south book room go walk,” but
   that jumble is clearly unacceptable, especially as the context requires
   it to be a noun. Google Translate invented “South study walking,” which
   is not ful.

   Now I admit that the Chinese phrase was utterly opaque to me. Although
   literally it looked like it meant something about moving about on foot
   in a study on the south side of some building, I knew that couldn’t be
   right; it made no sense in the context. To translate it, I had to find
   out about something in Chinese culture that I was ignorant of. So where
   did I turn for ? To Google! (But not to Google Translate.) I typed
   in the Chinese characters, surrounded them by quote marks, then did a
   Google search for that exact literal string. Lickety-split, up came a
   bunch of web pages in Chinese, and then I painfully slogged my way
   through the opening paragraphs of the first couple of websites, trying
   to figure out what the phrase was all about.

   I discovered the term dates back to the Qing Dynasty (1644–1911), and
   refers to an intellectual assistant to the emperor, whose duty was to
    the emperor (in the imperial palace’s south study) stylishly craft
   official statements. The two characters that seem to mean “go walk”
   actually form a chunk denoting an aide. And so, given that information
   supplied by Google Search, I came up with my phrase “South Study
   special aide.”

   It’s too bad Google Translate couldn’t avail itself of the services of
   Google Search as I did, isn’t it? But then again, Google Translate
   can’t understand web pages, although it can translate them in the
   twinkling of an eye. Or can it? Below I exhibit the astounding piece of
   output text that Google Translate super-swiftly spattered across my
   screen after being fed the opening of the website that I got my info
   from:

     “South study walking” is not an official position, before the Qing
     era this is just a “messenger,” generally by the then imperial
     intellectuals Hanlin to serve as. South study in the Hanlin
     officials in the “select chencai only goods and excellent” into the
     value, called “South study walking.” Because of the close to the
     emperor, the emperor’s decision to have a certain influence.
     Yongzheng later set up “military aircraft,” the Minister of the
     military machine, full-time, although the study is still Hanlin into
     the value, but has no participation in government affairs. Scholars
     in the Qing Dynasty into the value of the South study proud. Many
     scholars and scholars in the early Qing Dynasty into the south
     through the study.

   Is this actually in English? Of course we all agree that it’s made of
   English words (for the most part, anyway), but does that imply that
   it’s a passage in English? To my mind, since the above paragraph
   contains no meaning, it’s not in English; it’s just a jumble made of
   English ingredients—a random word salad, an incoherent hodgepodge.

   In case you’re curious, here’s my version of the same passage (it took
   me hours):

     The nan-shufang-xingzou (“South Study special aide”) was not an
     official position, but in the early Qing Dynasty it was a special
     role generally filled by whoever was the emperor’s current
     intellectual academician. The group of academicians who worked in
     the imperial palace’s south study would choose, among themselves,
     someone of great talent and good character to serve as ghostwriter
     for the emperor, and always to be at the emperor’s beck and call;
     that is why this role was called “South Study special aide.” The
     South Study aide, being so close to the emperor, was clearly in a
     position to influence the latter’s policy decisions. However, after
     Emperor Yongzheng established an official military ministry with a
     minister and various lower positions, the South Study aide, despite
     still being in the service of the emperor, no longer played a major
     role in governmental decision-making. Nonetheless, Qing Dynasty
     scholars were eager for the glory of working in the emperor’s south
     study, and during the early part of that dynasty, quite a few famous
     scholars served the emperor as South Study special aides.

   Some readers may suspect that I, in order to bash Google Translate,
   cherry-picked passages on which it stumbled terribly, and that it
   actually does far better on the vast majority of passages. Though that
   sounds plausible, it’s not the case. Nearly every paragraph I selected
   from books I’m currently reading gave rise to translation blunders of
   all shapes and sizes, including senseless and incomprehensible phrases,
   as above.

   Of course I grant that Google Translate sometimes comes up with a
   series of output sentences that sound fine (although they may be
   misleading or utterly wrong). A whole paragraph or two may come out
   superbly, giving the illusion that Google Translate knows what it is
   doing, understands what it is “reading.” In such cases, Google
   Translate seems truly impressive—almost human! Praise is certainly due
   to its creators and their collective hard work. But at the same time,
   don’t forget what Google Translate did with these two Chinese passages,
   and with the earlier French and German passages. To understand such
   failures, one has to keep the ELIZA effect in mind. The bailingual
   engine isn’t reading anything—not in the normal human sense of the verb
   “to read.” It’s processing text. The symbols it’s processing are
   disconnected from experiences in the world. It has no memories on which
   to draw, no imagery, no understanding, no meaning residing behind the
   words it so rapidly flings around.

     
   A friend asked me whether Google Translate’s level of skill isn’t
   merely a function of the program’s database. He figured that if you
   multiplied the database by a factor of, say, a million or a billion,
   eventually it would be able to translate anything thrown at it, and
   essentially perfectly. I don’t think so. Having ever more “big data”
   won’t bring you any closer to understanding, since understanding
   involves having ideas, and lack of ideas is the root of all the
   problems for machine translation today. So I would venture that bigger
   databases—even vastly bigger ones—won’t turn the trick.

   Another natural question is whether Google Translate’s use of neural
   networks—a gesture toward imitating brains—is bringing us closer to
   genuine understanding of language by machines. This sounds plausible at
   first, but there’s still no attempt being made to go beyond the surface
   level of words and phrases. All sorts of statistical facts about the
   huge databases are embodied in the neural nets, but these statistics
   merely relate words to other words, not to ideas. There’s no attempt to
   create internal structures that could be thought of as ideas, images,
   memories, or experiences. Such mental etherea are still far too elusive
   to deal with computationally, and so, as a substitute, fast and
   sophisticated statistical word-clustering algorithms are used. But the
   results of such techniques are no match for actually having ideas
   involved as one reads, understands, creates, modifies, and judges a
   piece of writing.

   Despite my negativism, Google Translate offers a service many people
   value highly: It effects quick-and-dirty conversions of meaningful
   passages written in language A into not necessarily meaningful strings
   of words in language B. As long as the text in language B is somewhat
   comprehensible, many people feel perfectly satisfied with the end
   product. If they can “get the basic idea” of a passage in a language
   they don’t know, they’re happy. This isn’t what I personally think the
   word “translation” means, but to some people it’s a great service, and
   to them it qualifies as translation. Well, I can see what they want,
   and I understand that they’re happy. Lucky them!

   I’ve recently seen bar graphs made by technophiles that claim to
   represent the “quality” of translations done by humans and by
   computers, and these graphs depict the latest translation engines as
   being within striking distance of human-level translation. To me,
   however, such quantification of the unquantifiable reeks of
   pseudoscience, or, if you prefer, of nerds trying to mathematize things
   whose intangible, subtle, artistic nature eludes them. To my mind,
   Google Translate’s output today ranges all the way from excellent to
   grotesque, but I can’t quantify my feelings about it. Think of my first
   example involving “his” and “her” items. The idealess program got
   nearly all the words right, but despite that slight success, it totally
   missed the point. How, in such a case, should one “quantify” the
   quality of the job? The use of scientific-looking bar graphs to
   represent translation quality is simply an abuse of the external
   trappings of science.

   Let me return to that sad image of human translators, soon outdone and
   outmoded, gradually turning into nothing but quality controllers and
   text tweakers. That’s a recipe for mediocrity at best. A serious artist
   doesn’t start with a kitschy piece of error-ridden bilgewater and then
   patch it up here and there to produce a work of high art. That’s not
   the nature of art. And translation is an art.

   In my writings over the years, I’ve always maintained that the human
   brain is a machine—a very complicated kind of machine—and I’ve
   vigorously opposed those who say that machines are intrinsically
   incapable of dealing with meaning. There is even a school of
   philosophers who claim computers could never “have semantics” because
   they’re made of “the wrong stuff” (silicon). To me, that’s facile
   nonsense. I won’t touch that debate here, but I wouldn’t want to leave
   readers with the impression that I believe intelligence and
   understanding to be forever inaccessible to computers. If in this essay
   I seem to come across sounding that way, it’s because the technology
   I’ve been discussing makes no attempt to reproduce human intelligence.
   Quite the contrary: It attempts to make an end run around human
   intelligence, and the output passages exhibited above clearly reveal
   its giant lacunas.

   From my point of view, there is no fundamental reason that machines
   could not, in principle, someday think, be creative, funny, nostalgic,
   excited, frightened, ecstatic, resigned, hopeful, and, as a corollary,
   able to translate admirably between languages. There’s no fundamental
   reason that machines might not someday succeed smashingly in
   translating jokes, puns, screenplays, novels, poems, and, of course,
   essays like this one. But all that will come about only when machines
   are as filled with ideas, emotions, and experiences as human beings
   are. And that’s not around the corner. Indeed, I believe it is still
   extremely far away. At least that is what this lifelong admirer of the
   human mind’s profundity fervently hopes.

   When, one day, a translation engine crafts an artistic novel in verse
   in English, using precise rhyming iambic tetrameter rich in wit,
   pathos, and sonic verve, then I’ll know it’s time for me to tip my hat
   and bow out.
     __________________________________________________________________

   This article originally misstated the number of languages for which
   the deep-learning version of Google Translate is available. We regret
   the error.

   We want to hear what you think about this article.  a letter to
   the editor or write to letters@theatlantic.com.


   Douglas Hofstadter is a professor of cognitive science and comparative
   literature at Indiana University at Bloomington. He is the author of
   Gödel, Escher, Bach.


   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-56LJR35
</file>
<file1-10.txt>
 
     
Machine Translation: Mining Text for Social Theory

   Annual Review of Sociology

   Vol. 42:21-50 (Volume publication date July 2016)
   First published online as a Review in Advance on June 1, 2016
   https://doi.org/10.1146/annurev-soc-081715-074206

   James A. Evans and Pedro Aceves

 
Abstract

   More of the social world lives within electronic text than ever before,
   from collective activity on the web, social media, and instant
   messaging to online transactions, government intelligence, and
   digitized libraries. This supply of text has elicited demand for
   natural language processing and machine learning tools to filter,
   search, and translate text into valuable data. We survey some of the
   most exciting computational approaches to text analysis, highlighting
   both supervised methods that extend old theories to new data and
   unsupervised techniques that discover hidden regularities worth
   theorizing. We then review recent research that uses these tools to
   develop social insight by exploring (a) collective attention and
   reasoning through the content of communication; (b) social
   relationships through the process of communication; and (c) social
   states, roles, and moves identified through heterogeneous signals
   within communication. We highlight social questions for which these
   advances could offer powerful new insight.

Keywords

   content analysis, big data, natural language processing, machine
   learning, text analysis, computational methods, grounded theory
      Figures

 
</file>
<file1-11.txt>
   alternate Edit this page Wikipedia (en)

Machine translation

   From Wikipedia, the free encyclopedia
   Jump to navigation Jump to search

  
   Machine translation, sometimes referred to by the abbreviation MT (not
   to be confused with computer-aided translation, machine-aided human
   translation (MAHT) or interactive translation) is a sub-field of
   computational linguistics that investigates the use of software to
   translate text or speech from one language to another.

   On a basic level, MT performs simple substitution of words in one
   language for words in another, but that alone usually cannot produce a
   good translation of a text because recognition of whole phrases and
   their closest counterparts in the target language is needed. Solving
   this problem with corpus statistical, and neural techniques is a
   rapidly growing field that is leading to better translations, handling
   differences in linguistic typology, translation of idioms, and the
   isolation of anomalies.^[1]^[not in citation given]

   Current machine translation software often allows for customization by
   domain or profession (such as weather reports), improving output by
   limiting the scope of allowable substitutions. This technique is
   particularly effective in domains where formal or formulaic language is
   used. It follows that machine translation of government and legal
   documents more readily produces usable output than conversation or less
   standardised text.

   Improved output quality can also be achieved by human intervention: for
   example, some systems are able to translate more accurately if the user
   has unambiguously identified which words in the text are proper names.
   With the assistance of these techniques, MT has proven useful as a tool
   to assist human translators and, in a very limited number of cases, can
   even produce output that can be used as is (e.g., weather reports).

   The progress and potential of machine translation have been debated
   much through its history. Since the 1950s, a number of scholars have
   questioned the possibility of achieving fully automatic machine
   translation of high quality, first and most notably by Yehoshua
   Bar-Hillel.^[2] Some critics claim that there are in-principle
   obstacles to automating the translation process.^[3]
   [_]


History[edit]

   Main article: History of machine translation

   The idea of machine translation may be traced back to the 17th century.
   In 1629, René Descartes proposed a universal language, with equivalent
   ideas in different tongues sharing one symbol.^[4] The field of
   "machine translation" appeared in Warren Weaver^[5]'s Memorandum on
   Translation (1949). The first researcher in the field, Yehosha
   Bar-Hillel, began his research at MIT (1951). A Georgetown University
   MT research team followed (1951) with a public demonstration of its
   Georgetown-IBM experiment system in 1954. MT research programs popped
   up in Japan^[6]^[7] and Russia (1955), and the first MT conference was
   held in London (1956).^[8]^[9] Researchers continued to join the field
   as the Association for Machine Translation and Computational
   Linguistics was formed in the U.S. (1962) and the National Academy of
   Sciences formed the Automatic Language Processing Advisory Committee
   (ALPAC) to study MT (1964). Real progress was much slower, however, and
   after the ALPAC report (1966), which found that the ten-year-long
   research had failed to fulfill expectations, funding was greatly
   reduced.^[10] According to a 1972 report by the Director of Defense
   Research and Engineering (DDR&E), the feasibility of large-scale MT was
   reestablished by the success of the Logos MT system in translating
   military manuals into Vietnamese during that conflict.

   The French Textile Institute also used MT to translate abstracts from
   and into French, English, German and Spanish (1970); Brigham Young
   University started a project to translate Mormon texts by automated
   translation (1971); and Xerox used SYSTRAN to translate technical
   manuals (1978). Beginning in the late 1980s, as computational power
   increased and became less expensive, more interest was shown in
   statistical models for machine translation. Various MT companies were
   launched, including Trados (1984), which was the first to develop and
   market translation memory technology (1989). The first commercial MT
   system for Russian / English / German-Ukrainian was developed at
   Kharkov State University (1991).

   MT on the web started with SYSTRAN Offering free translation of small
   texts (1996), followed by AltaVista Babelfish, which racked up 500,000
   requests a day (1997). Franz-Josef Och (the future head of Translation
   Development AT Google) won DARPA's speed MT competition (2003). More
   innovations during this time included MOSES, the open-source
   statistical MT engine (2007), a text/SMS translation service for
   mobiles in Japan (2008), and a mobile phone with built-in
   speech-to-speech translation functionality for English, Japanese and
   Chinese (2009). Recently, Google announced that Google Translate
   translates roughly enough text to fill 1 million books in one day
   (2012).

   The idea of using digital computers for translation of natural
   languages was proposed as early as 1946 by A. D. Booth and possibly
   others. Warren Weaver wrote an important memorandum "Translation" in
   1949. The Georgetown experiment was by no means the first such
   application, and a demonstration was made in 1954 on the APEXC machine
   at Birkbeck College (University of London) of a rudimentary translation
   of English into French. Several papers on the topic were published at
   the time, and even articles in popular journals (see for example
   Wireless World, Sept. 1955, Cleave and Zacharov). A similar
   application, also pioneered at Birkbeck College at the time, was
   reading and composing Braille texts by computer.

Translation process[edit]

   Main article: Translation process

   The human translation process may be described as:
    1. Decoding the meaning of the source text; and
    2. Re-encoding this meaning in the target language.

   Behind this ostensibly simple procedure lies a complex cognitive
   operation. To decode the meaning of the source text in its entirety,
   the translator must interpret and analyse all the features of the text,
   a process that requires in-depth knowledge of the grammar, semantics,
   syntax, idioms, etc., of the source language, as well as the culture of
   its speakers. The translator needs the same in-depth knowledge to
   re-encode the meaning in the target language.

   Therein lies the challenge in machine translation: how to program a
   computer that will "understand" a text as a person does, and that will
   "create" a new text in the target language that sounds as if it has
   been written by a person.

   In its most general application, this is beyond current technology.
   Though it works much faster, no automated translation program or
   procedure, with no human participation, can produce output even close
   to the quality a human translator can produce. What it can do, however,
   is provide a general, though imperfect, approximation of the original
   text, getting the "gist" of it (a process called "gisting"). This is
   sufficient for many purposes, including making best use of the finite
   and expensive time of a human translator, reserved for those cases in
   which total accuracy is indispensable.

   This problem may be approached in a number of ways, through the
   evolution of which accuracy has improved.

Approaches[edit]

   Bernard Vauquois' pyramid showing comparative depths of intermediary
   representation, interlingual machine translation at the peak, followed
   by transfer-based, then direct translation.

   Machine translation can use a method based on linguistic rules, which
   means that words will be translated in a linguistic way – the most
   suitable (orally speaking) words of the target language will replace
   the ones in the source language.^[citation needed]

   It is often argued that the success of machine translation requires the
   problem of natural language understanding to be solved first.^[11]

   Generally, rule-based methods parse a text, usually creating an
   intermediary, symbolic representation, from which the text in the
   target language is generated. According to the nature of the
   intermediary representation, an approach is described as interlingual
   machine translation or transfer-based machine translation. These
   methods require extensive lexicons with morphological, syntactic, and
   semantic information, and large sets of rules.

   Given enough data, machine translation programs often work well enough
   for a native speaker of one language to get the approximate meaning of
   what is written by the other native speaker. The difficulty is getting
   enough data of the right kind to support the particular method. For
   example, the large multilingual corpus of data needed for statistical
   methods to work is not necessary for the grammar-based methods. But
   then, the grammar methods need a skilled linguist to carefully design
   the grammar that they use.

   To translate between closely related languages, the technique referred
   to as rule-based machine translation may be used.

Rule-based[edit]

   Main article: Rule-based machine translation

   The rule-based machine translation paradigm includes transfer-based
   machine translation, interlingual machine translation and
   dictionary-based machine translation paradigms. This type of
   translation is used mostly in the creation of dictionaries and grammar
   programs. Unlike other methods, RBMT involves more information about
   the linguistics of the source and target languages, using the
   morphological and syntactic rules and semantic analysis of both
   languages. The basic approach involves linking the structure of the
   input sentence with the structure of the output sentence using a parser
   and an analyzer for the source language, a generator for the target
   language, and a transfer lexicon for the actual translation. RBMT's
   biggest downfall is that everything must be made explicit:
   orthographical variation and erroneous input must be made part of the
   source language analyser in order to cope with it, and lexical
   selection rules must be written for all instances of ambiguity.
   Adapting to new domains in itself is not that hard, as the core grammar
   is the same across domains, and the domain-specific adjustment is
   limited to lexical selection adjustment.

Transfer-based machine translation[edit]

   Main article: Transfer-based machine translation

   Transfer-based machine translation is similar to interlingual machine
   translation in that it creates a translation from an intermediate
   representation that simulates the meaning of the original sentence.
   Unlike interlingual MT, it depends partially on the language pair
   involved in the translation.

Interlingual[edit]

   Main article: Interlingual machine translation

   Interlingual machine translation is one instance of rule-based
   machine-translation approaches. In this approach, the source language,
   i.e. the text to be translated, is transformed into an interlingual
   language, i.e. a "language neutral" representation that is independent
   of any language. The target language is then generated out of the
   interlingua. One of the major advantages of this system is that the
   interlingua becomes more valuable as the number of target languages it
   can be turned into increases. However, the only interlingual machine
   translation system that has been made operational at the commercial
   level is the KANT system (Nyberg and Mitamura, 1992), which is designed
   to translate Caterpillar Technical English (CTE) into other languages.

Dictionary-based[edit]

   Main article: Dictionary-based machine translation

   Machine translation can use a method based on dictionary entries, which
   means that the words will be translated as they are by a dictionary.

Statistical[edit]

   Main article: Statistical machine translation

   Statistical machine translation tries to generate translations using
   statistical methods based on bilingual text corpora, such as the
   Canadian Hansard corpus, the English-French record of the Canadian
   parliament and EUROPARL, the record of the European Parliament. Where
   such corpora are available, good results can be achieved translating
   similar texts, but such corpora are still rare for many language pairs.
   The first statistical machine translation software was CANDIDE from
   IBM. Google used SYSTRAN for several years, but switched to a
   statistical translation method in October 2007.^[12] In 2005, Google
   improved its internal translation capabilities by using approximately
   200 billion words from United Nations materials to train their system;
   translation accuracy improved.^[13] Google Translate and similar
   statistical translation programs work by detecting patterns in hundreds
   of millions of documents that have previously been translated by humans
   and making intelligent guesses based on the findings. Generally, the
   more human-translated documents available in a given language, the more
   likely it is that the translation will be of good quality.^[14] Newer
   approaches into Statistical Machine translation such as METIS II and
   PRESEMT use minimal corpus size and instead focus on derivation of
   syntactic structure through pattern recognition. With further
   development, this may allow statistical machine translation to operate
   off of a monolingual text corpus.^[15] SMT's biggest downfall includes
   it being dependent upon huge amounts of parallel texts, its problems
   with morphology-rich languages (especially with translating into such
   languages), and its inability to correct singleton errors.

Example-based[edit]

   Main article: Example-based machine translation

   Example-based machine translation (EBMT) approach was proposed by
   Makoto Nagao in 1984.^[16]^[17] Example-based machine translation is
   based on the idea of analogy. In this approach, the corpus that is used
   is one that contains texts that have already been translated. Given a
   sentence that is to be translated, sentences from this corpus are
   selected that contain similar sub-sentential components.^[18] The
   similar sentences are then used to translate the sub-sentential
   components of the original sentence into the target language, and these
   phrases are put together to form a complete translation.

Hybrid MT[edit]

   Main article: Hybrid machine translation

   Hybrid machine translation (HMT) leverages the strengths of statistical
   and rule-based translation methodologies.^[19] Several MT organizations
   (such as Omniscien Technologies (formerly Asia Online), LinguaSys,
   Systran, and Polytechnic University of Valencia) claim a hybrid
   approach that uses both rules and statistics. The approaches differ in
   a number of ways:
      Rules post-processed by statistics: Translations are performed
       using a rules based engine. Statistics are then used in an attempt
       to adjust/correct the output from the rules engine.
      Statistics guided by rules: Rules are used to pre-process data in
       an attempt to better guide the statistical engine. Rules are also
       used to post-process the statistical output to perform functions
       such as normalization. This approach has a lot more power,
       flexibility and control when translating. It also provides
       extensive control over the way in which the content is processed
       during both pre-translation (e.g. markup of content and
       non-translatable terms) and post-translation (e.g. post translation
       corrections and adjustments).

   More recently, with the advent of Neural MT, a new version of hybrid
   machine translation is emerging that combines the benefits of rules,
   statistical and neural machine translation. The approach allows
   benefitting from pre- and post-processing in a rule guided workflow as
   well as benefitting from NMT and SMT. The downside is the inherent
   complexity which makes the approach suitable only for specific use
   cases. One of the proponents of this approach for complex use cases is
   Omniscien Technologies.

Neural MT[edit]

   Main article: Neural machine translation

   A deep learning based approach to MT, neural machine translation has
   made rapid progress in recent years, and Google has announced its
   translation services are now using this technology in preference to its
   previous statistical methods^[20]. Other providers including
   Pangeanic^[21], KantanMT^[22], Omniscien Technologies^[23] and SDL^[24]
   have announced the deployment of neural machine translation technology
   in 2017 as well.

Major issues[edit]

   Machine translation could produce some non-understandable phrases.
   Broken Chinese "沒有進入" from machine translation in Bali, Indonesia. The
   broken Chinese sentence sounds like "there does not exist an entry" or
   "have not entered yet"

Disambiguation[edit]

   Main articles: Word sense disambiguation and Syntactic disambiguation

   Word-sense disambiguation concerns finding a suitable translation when
   a word can have more than one meaning. The problem was first raised in
   the 1950s by Yehoshua Bar-Hillel.^[25] He pointed out that without a
   "universal encyclopedia", a machine would never be able to distinguish
   between the two meanings of a word.^[26] Today there are numerous
   approaches designed to overcome this problem. They can be approximately
   divided into "shallow" approaches and "deep" approaches.

   Shallow approaches assume no knowledge of the text. They simply apply
   statistical methods to the words surrounding the ambiguous word. Deep
   approaches presume a comprehensive knowledge of the word. So far,
   shallow approaches have been more successful.^[27]

   Claude Piron, a long-time translator for the United Nations and the
   World Health Organization, wrote that machine translation, at its best,
   automates the easier part of a translator's job; the harder and more
   time-consuming part usually involves doing extensive research to
   resolve ambiguities in the source text, which the grammatical and
   lexical exigencies of the target language require to be resolved:

          Why does a translator need a whole workday to translate five
          pages, and not an hour or two? ..... About 90% of an average
          text corresponds to these simple conditions. But unfortunately,
          there's the other 10%. It's that part that requires six [more]
          hours of work. There are ambiguities one has to resolve. For
          instance, the author of the source text, an Australian
          physician, cited the example of an epidemic which was declared
          during World War II in a "Japanese prisoner of war camp". Was he
          talking about an American camp with Japanese prisoners or a
          Japanese camp with American prisoners? The English has two
          senses. It's necessary therefore to do research, maybe to the
          extent of a phone call to Australia.^[28]

   The ideal deep approach would require the translation software to do
   all the research necessary for this kind of disambiguation on its own;
   but this would require a higher degree of AI than has yet been
   attained. A shallow approach which simply guessed at the sense of the
   ambiguous English phrase that Piron mentions (based, perhaps, on which
   kind of prisoner-of-war camp is more often mentioned in a given corpus)
   would have a reasonable chance of guessing wrong fairly often. A
   shallow approach that involves "ask the user about each ambiguity"
   would, by Piron's estimate, only automate about 25% of a professional
   translator's job, leaving the harder 75% still to be done by a human.

Non-standard speech[edit]

   One of the major pitfalls of MT is its inability to translate
   non-standard language with the same accuracy as standard language.
   Heuristic or statistical based MT takes input from various sources in
   standard form of a language. Rule-based translation, by nature, does
   not include common non-standard usages. This causes errors in
   translation from a vernacular source or into colloquial language.
   Limitations on translation from casual speech present issues in the use
   of machine translation in mobile devices.

Named entities[edit]

          Related to named entity recognition in information extraction.

   Name entities, in narrow sense, refer to concrete or abstract entities
   in the real world including people, organizations, companies, places
   etc. It also refers to expressing of time, space, quantity such as 1
   July 2011, $79.99 and so on.^[29]

   Named entities occur in the text being analyzed in statistical machine
   translation. The initial difficulty that arises in dealing with named
   entities is simply identifying them in the text. Consider the list of
   names common in a particular language to illustrate this – the most
   common names are different for each language and also are constantly
   changing. If named entities cannot be recognized by the machine
   translator, they may be erroneously translated as common nouns, which
   would most likely not affect the BLEU rating of the translation but
   would change the text's human readability.^[30] It is also possible
   that, when not identified, named entities will be omitted from the
   output translation, which would also have implications for the text's
   readability and message.

   Another way to deal with named entities is to use transliteration
   instead of translation, meaning that you find the letters in the target
   language that most closely correspond to the name in the source
   language. There have been attempts to incorporate this into machine
   translation by adding a transliteration step into the translation
   procedure. However, these attempts still have their problems and have
   even been cited as worsening the quality of translation.^[31] Named
   entities were still identified incorrectly, with words not being
   transliterated when they should or being transliterated when they
   shouldn't. For example, for "Southern California" the first word should
   be translated directly, while the second word should be transliterated.
   However, machines would often transliterate both because they treated
   them as one entity. Words like these are hard for machine translators,
   even those with a transliteration component, to process.

   The lack of attention to the issue of named entity translation has been
   recognized as potentially stemming from a lack of resources to devote
   to the task in addition to the complexity of creating a good system for
   named entity translation. One approach to named entity translation has
   been to transliterate, and not translate, those words. A second is to
   create a "do-not-translate" list, which has the same end goal –
   transliteration as opposed to translation.^[32] Both of these
   approaches still rely on the correct identification of named entities,
   however.

   A third approach to successful named entity translation is a
   class-based model. In this method, named entities are replaced with a
   token to represent the class they belong to. For example, "Ted" and
   "Erica" would both be replaced with "person" class token. In this way
   the statistical distribution and use of person names in general can be
   analyzed instead of looking at the distributions of "Ted" and "Erica"
   individually. A problem that the class based model solves is that the
   probability of a given name in a specific language will not affect the
   assigned probability of a translation. A study by Stanford on improving
   this area of translation gives the examples that different
   probabilities will be assigned to "David is going for a walk" and
   "Ankit is going for a walk" for English as a target language due to the
   different number of occurrences for each name in the training data. A
   frustrating outcome of the same study by Stanford (and other attempts
   to improve named recognition translation) is that many times, a
   decrease in the BLEU scores for translation will result from the
   inclusion of methods for named entity translation.^[32]

Translation from multiparallel sources[edit]

   Some work has been done in the utilization of multiparallel corpora,
   that is a body of text that has been translated into 3 or more
   languages. Using these methods, a text that has been translated into 2
   or more languages may be utilized in combination to provide a more
   accurate translation into a third language compared with if just one of
   those source languages were used alone.^[33]^[34]^[35]

Ontologies in MT[edit]

   An ontology is a formal representation of knowledge which includes the
   concepts (such as objects, processes etc.) in a domain and some
   relations between them. If the stored information is of linguistic
   nature, one can speak of a lexicon.^[36] In NLP, ontologies can be used
   as a source of knowledge for machine translation systems. With access
   to a large knowledge base, systems can be enabled to resolve many
   (especially lexical) ambiguities on their own. In the following classic
   examples, as humans, we are able to interpret the prepositional phrase
   according to the context because we use our world knowledge, stored in
   our lexicons:

     "I saw a man/star/molecule with a
     microscope/telescope/binoculars."^[36]

   A machine translation system initially would not be able to
   differentiate between the meanings because syntax does not change. With
   a large enough ontology as a source of knowledge however, the possible
   interpretations of ambiguous words in a specific context can be
   reduced. Other areas of usage for ontologies within NLP include
   information retrieval, information extraction and text
   summarization.^[36]

Building ontologies[edit]

   The ontology generated for the PANGLOSS knowledge-based machine
   translation system in 1993 may serve as an example of how an ontology
   for NLP purposes can be compiled:^[37]
      A large-scale ontology is necessary to  parsing in the active
       modules of the machine translation system.
      In the PANGLOSS example, about 50.000 nodes were intended to be
       subsumed under the smaller, manually-built upper (abstract) region
       of the ontology. Because of its size, it had to be created
       automatically.
      The goal was to merge the two resources LDOCE online and WordNet to
       combine the benefits of both: concise definitions from Longman, and
       semantic relations allowing for semi-automatic taxonomization to
       the ontology from WordNet.
          + A definition match algorithm was created to automatically
            merge the correct meanings of ambiguous words between the two
            online resources, based on the words that the definitions of
            those meanings have in common in LDOCE and WordNet. Using a
            similarity matrix, the algorithm delivered matches between
            meanings including a confidence factor. This algorithm alone,
            however, did not match all meanings correctly on its own.
          + A second hierarchy match algorithm was therefore created which
            uses the taxonomic hierarchies found in WordNet (deep
            hierarchies) and partially in LDOCE (flat hierarchies). This
            works by first matching unambiguous meanings, then limiting
            the search space to only the respective ancestors and
            descendants of those matched meanings. Thus, the algorithm
            matched locally unambiguous meanings (for instance, while the
            word seal as such is ambiguous, there is only one meaning of
            "seal" in the animal subhierarchy).
      Both algorithms complemented each other and ed constructing a
       large-scale ontology for the machine translation system. The
       WordNet hierarchies, coupled with the matching definitions of
       LDOCE, were subordinated to the ontology's upper region. As a
       result, the PANGLOSS MT system was able to make use of this
       knowledge base, mainly in its generation element.

Applications[edit]

   While no system provides the holy grail of fully automatic high-quality
   machine translation of unrestricted text, many fully automated systems
   produce reasonable output.^[38]^[39]^[40] The quality of machine
   translation is substantially improved if the domain is restricted and
   controlled.^[41]

   Despite their inherent limitations, MT programs are used around the
   world. Probably the largest institutional user is the European
   Commission. The MOLTO project, for example, coordinated by the
   University of Gothenburg, received more than 2.375 million euros
   project support from the EU to create a reliable translation tool that
   covers a majority of the EU languages.^[42] The further development of
   MT systems comes at a time when budget cuts in human translation may
   increase the EU's dependency on reliable MT programs.^[43] The European
   Commission contributed 3.072 million euros (via its ISA programme) for
   the creation of MT@EC, a statistical machine translation program
   tailored to the administrative needs of the EU, to replace a previous
   rule-based machine translation system.^[44]

   In 2005, Google claimed that promising results were obtained using a
   proprietary statistical machine translation engine.^[45] The
   statistical translation engine used in the Google language tools for
   Arabic  English and Chinese  English had an overall score of
   0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in
   tests conducted by the National Institute for Standards and
   Technology.^[46]^[47]^[48]

   With the recent focus on terrorism, the military sources in the United
   States have been investing significant amounts of money in natural
   language engineering. In-Q-Tel^[49] (a venture capital fund, largely
   funded by the US Intelligence Community, to stimulate new technologies
   through private sector entrepreneurs) brought up companies like
   Language Weaver. Currently the military community is interested in
   translation and processing of languages like Arabic, Pashto, and
   Dari.^[citation needed] Within these languages, the focus is on key
   phrases and quick communication between military members and civilians
   through the use of mobile phone apps.^[50] The Information Processing
   Technology Office in DARPA hosts programs like TIDES and Babylon
   translator. US Air Force has awarded a $1 million contract to develop a
   language translation technology.^[51]

   The notable rise of social networking on the web in recent years has
   created yet another niche for the application of machine translation
   software – in utilities such as , or instant messaging clients
   such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users
   speaking different languages to communicate with each other. Machine
   translation applications have also been released for most mobile
   devices, including mobile telephones, pocket PCs, PDAs, etc. Due to
   their portability, such instruments have come to be designated as
   mobile translation tools enabling mobile business networking between
   partners speaking different languages, or facilitating both foreign
   language learning and unaccompanied traveling to foreign countries
   without the need of the intermediation of a human translator.

   Despite being labelled as an unworthy competitor to human translation
   in 1966 by the Automated Language Processing Advisory Committee put
   together by the United States government,^[52] the quality of machine
   translation has now been improved to such levels that its application
   in online collaboration and in the medical field are being
   investigated. The application of this technology in medical settings
   where human translators are absent is another topic of research, but
   difficulties arise due to the importance of accurate translations in
   medical diagnoses.^[53]

Evaluation[edit]

   Main article: Evaluation of machine translation

   There are many factors that affect how machine translation systems are
   evaluated. These factors include the intended use of the translation,
   the nature of the machine translation software, and the nature of the
   translation process.

   Different programs may work well for different purposes. For example,
   statistical machine translation (SMT) typically outperforms
   example-based machine translation (EBMT), but researchers found that
   when evaluating English to French translation, EBMT performs
   better.^[54] The same concept applies for technical documents, which
   can be more easily translated by SMT because of their formal language.

   In certain applications, however, e.g., product descriptions written in
   a controlled language, a dictionary-based machine-translation system
   has produced satisfactory translations that require no human
   intervention save for quality inspection.^[55]

   There are various means for evaluating the output quality of machine
   translation systems. The oldest is the use of human judges^[56] to
   assess a translation's quality. Even though human evaluation is
   time-consuming, it is still the most reliable method to compare
   different systems such as rule-based and statistical systems.^[57]
   Automated means of evaluation include BLEU, NIST, METEOR, and
   LEPOR.^[58]

   Relying exclusively on unedited machine translation ignores the fact
   that communication in human language is context-embedded and that it
   takes a person to comprehend the context of the original text with a
   reasonable degree of probability. It is certainly true that even purely
   human-generated translations are prone to error. Therefore, to ensure
   that a machine-generated translation will be useful to a human being
   and that publishable-quality translation is achieved, such translations
   must be reviewed and edited by a human.^[59] The late Claude Piron
   wrote that machine translation, at its best, automates the easier part
   of a translator's job; the harder and more time-consuming part usually
   involves doing extensive research to resolve ambiguities in the source
   text, which the grammatical and lexical exigencies of the target
   language require to be resolved. Such research is a necessary prelude
   to the pre-editing necessary in order to provide input for
   machine-translation software such that the output will not be
   meaningless.^[60]

   In addition to disambiguation problems, decreased accuracy can occur
   due to varying levels of training data for machine translating
   programs. Both example-based and statistical machine translation rely
   on a vast array of real example sentences as a base for translation,
   and when too many or too few sentences are analyzed accuracy is
   jeopardized. Researchers found that when a program is trained on
   203,529 sentence pairings, accuracy actually decreases.^[54] The
   optimal level of training data seems to be just over 100,000 sentences,
   possibly because as training data increases, the number of possible
   sentences increases, making it harder to find an exact translation
   match.

Using machine translation as a teaching tool[edit]

   Although there have been concerns about machine translation's accuracy,
   Dr. Ana Nino of the University of Manchester has researched some of the
   advantages in utilizing machine translation in the classroom. One such
   pedagogical method is called using "MT as a Bad Model."^[61] MT as a
   Bad Model forces the language learner to identify inconsistencies or
   incorrect aspects of a translation; in turn, the individual will
   (hopefully) possess a better grasp of the language. Dr. Nino cites that
   this teaching tool was implemented in the late 1980s. At the end of
   various semesters, Dr. Nino was able to obtain survey results from
   students who had used MT as a Bad Model (as well as other models.)
   Overwhelmingly, students felt that they had observed improved
   comprehension, lexical retrieval, and increased confidence in their
   target language.^[61]

Machine translation and signed languages[edit]

   Main article: Machine translation of sign languages

   In the early 2000s, options for machine translation between spoken and
   signed languages were severely limited. It was a common belief that
   deaf individuals could use traditional translators. However, stress,
   intonation, pitch, and timing are conveyed much differently in spoken
   languages compared to signed languages. Therefore, a deaf individual
   may misinterpret or become confused about the meaning of written text
   that is based on a spoken language.^[62]

   Researchers Zhao,al. (2000), developed a prototype called TEAM
   (translation from English to ASL by machine) that completed English to
   American Sign Language (ASL) translations. The program would first
   analyze the syntactic, grammatical, and morphological aspects of the
   English text. Following this step, the program accessed a sign
   synthesizer, which acted as a dictionary for ASL. This synthesizer
   housed the process one must follow to complete ASL signs, as well as
   the meanings of these signs. Once the entire text is analyzed and the
   signs necessary to complete the translation are located in the
   synthesizer, a computer generated human appeared and would use ASL to
   sign the English text to the user.^[62]

Copyright[edit]

   Only works that are original are subject to copyright protection, so
   some scholars claim that machine translation results are not entitled
   to copyright protection because MT does not involve creativity.^[63]
   The copyright at issue is for a derivative work; the author of the
   original work in the original language does not lose his rights when a
   work is translated: a translator must have permission to publish a
   translation.

See also[edit]

      Comparison of machine translation applications
      Statistical machine translation
      Controlled language in machine translation
      Cache language model
      Computational linguistics
      Universal Networking Language
      Computer-assisted translation and Translation memory
      Foreign language writing aid
      Controlled natural language
      Fuzzy matching
      Postediting
      History of machine translation
      Human language technology
      Humour in translation ("howlers")
      Language and Communication Technologies
      Language barrier
      List of emerging technologies
      List of research laboratories for machine translation
      Neural machine translation
      Pseudo-translation
      Round-trip translation
      Translation
      Translation memory
      Universal translator
      Phraselator
      Mobile translation
      ULTRA (machine translation system)
      Comparison of different machine translation approaches
      OpenLogos

Notes[edit]

    1. ^ Albat, Thomas Fritz. "Systems and Methods for Automatically
       Estimating a Translation Time." US Patent 0185235, 19 July 2012.
    2. ^ Yehoshua Bar-Hillel (1964). Language and Information: Selected
       Essays on Their Theory and Application. Reading, MA:
       Addison-Wesley. pp. 174–179.
    3. ^ "Madsen, Mathias: The Limits of Machine Translation (2010)".
       Docs.google.com. Retrieved 2012-06-12.
    4. ^ 浜口, 稔 (30 April 1993). 英仏普遍言語計画. 工作舎. pp. 70–71.
       ISBN 978-4-87502-214-5.
       "普遍的文字の構築という初期の試みに言及するときは1629年11月にデカルトがメルセンヌに宛てた手紙から始まる、というのが通り相場とな
       っている。しかし、この問題への関心を最初に誘発した多くの要因を吟味してみると、ある種の共通の書字という構想は明らかに、ずっと以前から比
       較的なじみ深いものになっていたようである。…フランシス・ベイコンは、1605年出版の学問の進歩についてのなかで、そのような真正の文字の
       体系は便利であると述べていた"translated from
       Knowlson, James. UNIVERSAL LANGUAGE SCHEMES IN ENGLAND AND FRANCE
       1600-1800.

     ^ Delavenay, Émile. LA MACHINE A TRADUIRE (Collection QUE SAIS-JE?
   No.834). Translated by 別所照彦. Presses Universitaires de France.
   "英国人A.D.ブースとロックフェラー財団のワレン・ウィーバーとが同時に翻訳問題に手をつけたのは1946年のことであった。(translati
   on (assisted by Google translate):It was in 1946 when the English A. D.
   Booth and Warren Weaver at Rockefeller Foundation begun to study the
   issue on translation at the same time.)"

     ^ 上野, 俊夫 (1986-08-13). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese).
   Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X.
   "わが国では1956年、当時の電気試験所が英和翻訳専用機「ヤマト」を実験している。この機械は1962年頃には中学1年の教科書で90点以上の能力
   に達したと報告されている。(translation (assisted by Google translate): In 1959
   Japan, the National Institute of Advanced Industrial Science and
   Technology(AIST) tested the proper English-Japanese translation machine
   Yamato, which reported in 1964 as that reached the power level over the
   score of 90-point on the textbook of 1st grade of junior hi-school.)"

     ^ http://museum.ipsj.or.jp/computer/dawn/0027.html

     ^ Nye, Mary Jo (2016). "Speaking in Tongues: Science's centuries-long
   hunt for a common language". Distillations. 2 (1): 40–43. Retrieved 20
   March 2018.

     ^ Gordin, Michael D. (2015). Scientific Babel: How Science Was Done
   Before and After Global English. Chicago, Illinois: University of
   Chicago Press. ISBN 9780226000299.

     ^ 上野, 俊夫 (1986-08-13). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese).
   Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X.

     ^ John Lehrberger (1988). Machine Translation: Linguistic
   Characteristics of MT Systems and General Methodology of Evaluation.
   John Benjamins Publishing. ISBN 90-272-3124-9.

     ^ Chitu, Alex (22 October 2007). "Google Switches to Its Own
   Translation System". Googlesystem.blogspot.com. Retrieved 2012-08-13.

     ^ "Google Translator: The Universal Language". Blog.outer-court.com.
   25 January 2007. Retrieved 2012-06-12.

     ^ "Inside Google Translate – Google Translate".

     ^ http://www.mt-archive.info/10/HyTra-2013-Tambouratzis.pdf

     ^ Nagao, M. 1981. A Framework of a Mechanical Translation between
   Japanese and English by Analogy Principle, in Artificial and Human
   Intelligence, A. Elithorn and R. Banerji (eds.) North- Holland, pp.
   173–180, 1984.

     ^ "the Association for Computational Linguistics – 2003 ACL Lifetime
   Achievement Award". Association for Computational Linguistics.
   Retrieved 2010-03-10.

     ^
   http://kitt.cl.uzh.ch/clab/satzaehnlichkeit/tutorial/Unterlagen/Somers1
   999.pdf

     ^ Adam Boretz. "Boretz, Adam, "AppTek Launches Hybrid Machine
   Translation Software" SpeechTechMag.com (posted 2 MAR 2009)".
   Speechtechmag.com. Retrieved 2012-06-12.

     ^ "Google's neural network learns to translate languages it hasn't
   been trained on".

     ^ "EU Spends EUR 1.9m to Customize MT for State and Regional
   Authorities | Slator". Slator. 2017-07-09. Retrieved 2017-07-09.

     ^ "KantanMT Users Can Now Customise and Deploy Neural Machine
   Translation Engines | Slator". Slator. 2017-03-13. Retrieved
   2017-06-23.

     ^ "Omniscien Technologies Announces Release of Language Studio™ with
   Next-Generation NMT Technology | Slator". Slator. 2017-04-21. Retrieved
   2017-06-23.

     ^ Rowe, Sam Del (2017-06-12). "SDL Adds Neural Machine Translation to
   Its Enterprise Translation Server". CRM Magazine. Retrieved 2017-06-23.

     ^ Milestones in machine translation – No.6: Bar-Hillel and the
   nonfeasibility of FAHQT Archived 12 March 2007 at the Wayback Machine.
   by John Hutchins

     ^ Bar-Hillel (1960), "Automatic Translation of Languages". Available
   online at http://www.mt-archive.info/Bar-Hillel-1960.pdf

     ^ Hybrid approaches to machine translation. Costa-jussà, Marta R.,,
   Rapp, Reinhard,, Lambert, Patrik,, Eberle, Kurt,, Banchs, Rafael E.,,
   Babych, Bogdan,. Switzerland. ISBN 9783319213101. OCLC 953581497.

     ^ Claude Piron, Le défilangues (The Language Challenge), Paris,
   L'Harmattan, 1994.

     ^ [张政.计算机语言学与机器翻译导论.外语教学与研究出版社，2010]

     ^
   http://www.cl.cam.ac.uk/~ar283/eacl03/workshops03/W03-w1_eacl03babych.l
   ocal.pdf

     ^ Hermajakob, U., Knight, K., & Hal, D. (2008). Name Translation in
   Statistical Machine Translation Learning When to Transliterate.
   Association for Computational Linguistics. 389–397.

     ^ ^a ^b
   http://nlp.stanford.edu/courses/cs224n/2010/reports/singla-nirajuec.pdf

     ^ https://dowobeha.github.io/papers/amta08.pdf

     ^ http://homepages.inf.ed.ac.uk/mlap/Papers/acl07.pdf

     ^ https://www.jair.org/media/3540/live-3540-6293-jair.pdf

     ^ ^a ^b ^c Vossen, Piek: Ontologies. In: Mitkov, Ruslan (ed.) (2003):
   Handbook of Computational Linguistics, Chapter 25. Oxford: Oxford
   University Press.

     ^ Knight, Kevin. "Building a large ontology for machine translation
   (1993)" (PDF). Retrieved 7 September 2014.

     ^ "Melby, Alan. The Possibility of Language (Amsterdam:Benjamins,
   1995, 27–41)". Benjamins.com. Retrieved 2012-06-12.

     ^ Adam (14 February 2006). "Wooten, Adam. "A Simple Model Outlining
   Translation Technology" T&I Business (February 14, 2006)".
   Tandibusiness.blogspot.com. Retrieved 2012-06-12.

     ^ "Appendix III of 'The present status of automatic translation of
   languages', Advances in Computers, vol.1 (1960), p.158-163. Reprinted
   in Y.Bar-Hillel: Language and information (Reading, Mass.:
   Addison-Wesley, 1964), p.174-179" (PDF). Retrieved 2012-06-12.

     ^ "Human quality machine translation solution by Ta with you" (in
   Spanish). Tauyou.com. 15 April 2009. Retrieved 2012-06-12.

     ^ "molto-project.eu". molto-project.eu. Retrieved 2012-06-12.

     ^ SPIEGEL ONLINE, Hamburg, Germany (13 September 2013). "Google
   Translate Has Ambitious Goals for Machine Translation". SPIEGEL
   ONLINE.CS1 maint: Multiple names: authors list (link)

     ^ "Machine Translation Service". 5 August 2011.

     ^ Google Blog: The machines do the translating (by Franz Och)

     ^ "Geer, David, "Statistical Translation Gains Respect", pp. 18 – 21,
   IEEE Computer, October 2005" (PDF). Ieeexplore.ieee.org. 27 September
   2011. doi:10.1109/MC.2005.353. Retrieved 2012-06-12.

     ^ "Ratcliff, Evan "Me Translate Pretty One Day", Wired December
   2006". Wired.com. 4 January 2009. Retrieved 2012-06-12.

     ^ ""NIST 2006 Machine Translation Evaluation Official Results",
   November 1, 2006". Itl.nist.gov. Retrieved 2012-06-12.

     ^ "In-Q-Tel". In-Q-Tel. Archived from the original on 20 May 2016.
   Retrieved 12 June 2012.

     ^ Gallafent, Alex (26 Apr 2011). "Machine Translation for the
   Military". PRI's The World. PRI's The World. Retrieved 17 Sep 2013.

     ^ Jackson, William (9 September 2003). "GCN – Air force wants to
   build a universal translator". Gcn.com. Retrieved 2012-06-12.

     ^ http://www.nap.edu/html/alpac_lm/ARC000005.pdf

     ^ "Using machine translation in clinical practice".

     ^ ^a ^b Way, Andy; Nano Gough (20 September 2005). "Comparing
   Example-Based and Statistical Machine Translation". Natural Language
   Engineering. 11 (3): 295–309. doi:10.1017/S1351324905003888. Retrieved
   2014-03-23.

     ^ Muegge (2006), "Fully Automatic High Quality Machine Translation of
   Restricted Text: A Case Study," in Translating and the computer 28.
   Proceedings of the twenty-eighth international conference on
   translating and the computer, 16–17 November 2006, London, London:
   Aslib. ISBN 978-0-85142-483-5.

     ^ "Comparison of MT systems by human evaluation, May 2008".
   Morphologic.hu. Archived from the original on 19 April 2012. Retrieved
   12 June 2012.

     ^ Anderson, D.D. (1995). Machine translation as a tool in second
   language learning. CALICO Journal. 13(1). 68–96.

     ^ Hanal. (2012), "LEPOR: A Robust Evaluation Metric for Machine
   Translation with Augmented Factors," in Proceedings of the 24th
   International Conference on Computational Linguistics (COLING 2012):
   Posters, pages 441–450, Mumbai, India.

     ^ J.M. Cohen observes (p.14): "Scientific translation is the aim of
   an age that would reduce all activities to techniques. It is impossible
   however to imagine a literary-translation machine less complex than the
   human brain itself, with all its knowledge, reading, and
   discrimination."

     ^ See the annually performed NIST tests since 2001 and Bilingual
   Evaluation Understudy

     ^ ^a ^b Nino, Ana. "Machine Translation in Foreign Language Learning:
   Language Learners' and Tutors' Perceptions of Its Advantages and
   Disadvantages" ReCALL: the Journal of EUROCALL 21.2 (May 2009) 241–258.

     ^ ^a ^b Zhao, L., Kipper, K., Schuler, W., Vogler, C., & Palmer, M.
   (2000). A Machine Translation System from English to American Sign
   Language. Lecture Notes in Computer Science, 1934: 54–67.

     ^ "Machine Translation: No Copyright On The Result?". SEO Translator,
   citing Zimbabwe Independent. Retrieved 24 November 2012.

Further reading[edit]

      Cohen, J. M. (1986), "Translation", Encyclopedia Americana, 27,
       pp. 12–15

     Hutchins, W. John; Somers, Harold L. (1992). An Introduction to
   Machine Translation. London: Academic Press. ISBN 0-12-362830-X.

     Lewis-Kraus, Gideon, "Tower of Babble", New York Times Magazine, June
   7, 2015, pp. 48–52.

External links[edit]

   Wikiversity has learning resources about Topic:Computational
   linguistics

      The Advantages and Disadvantages of Machine Translation
      International Association for Machine Translation (IAMT)
      Machine Translation Archive by John Hutchins. An electronic
       repository (and bibliography) of articles, books and papers in the
       field of machine translation and computer-based translation
       technology
      Machine translation (computer-based translation) – Publications by
       John Hutchins (includes PDFs of several books on machine
       translation)
      Machine Translation and Minority Languages
      John Hutchins 1999

      v
      t
      e

   Natural language processing
   General terms
      Natural language understanding
      Text corpus
      Speech corpus
      Stopwords
      Bag-of-words
      AI-complete
      n-gram (Bigram, Trigram)

   Text analysis
      Text segmentation
      Part-of-speech tagging
      Text chunking
      Compound term processing
      Collocation extraction
      Stemming
      Lemmatisation
      Named-entity recognition
      Coreference resolution
      Sentiment analysis
      Concept mining
      Parsing
      Word-sense disambiguation
      Ontology learning
      Terminology extraction
      Truecasing

   Automatic summarization
      Multi-document summarization
      Sentence extraction
      Text simplification

   Machine translation
      Computer-assisted
      Example-based
      Rule-based

   Automatic identification
   and data capture
      Speech recognition
      Speech synthesis
      Optical character recognition
      Natural language generation

   Topic model
      Pachinko allocation
      Latent Dirichlet allocation
      Latent semantic analysis

   Computer-assisted
   reviewing
      Automated essay scoring
      Concordancer
      Grammar checker
      Predictive text
      Spell checker
      Syntax guessing

   Natural language
   user interface
      Automated online assistant
      Chatbot
      Interactive fiction
      Question answering
      Voice user interface

      v
      t
      e

   Approaches to machine translation
      Dictionary-based
      Rule-based
      Transfer-based
      Statistical
      Example-based
      Interlingual
      Neural
      Hybrid

      v
      t
      e

   Emerging technologies
   Fields
   Agriculture
      Agricultural robot
      Closed ecological systems
      Cultured meat
      Genetically modified food
      Precision agriculture
      Vertical farming

   Architecture
      Arcology
      Building printing
          + Contour crafting
      Domed city

   Biomedical
      Artificial uterus
      Ampakine
      Brain transplant
      Cryonics
          + Cryoprotectant
          + Cryopreservation
          + Vitrification
          + Suspended animation
      De-extinction
      Genetic engineering
          + Gene therapy
      Head transplant
      Isolated brain
      Life extension
          + Strategies for Engineered Negligible Senescence
      Nanomedicine
      Nanosensors
      Organ printing
      Personalized medicine
      Regenerative medicine
          + Stem-cell therapy
          + Tissue engineering
      Robot-assisted surgery
      Synthetic biology
          + Synthetic genomics
      Virotherapy
          + Oncolytic virus
      Tricorder
      Whole genome sequencing

   Displays
   Next generation
      FED
      FLCD
      iMoD
      Laser
      LPD
      OLED
      OLET
      QD-LED
      SED
      TPD
      TDEL
      TMOS

   Screenless
      Bionic contact lens
      Head-mounted display
      Head-up display
      Optical head-mounted display
      Virtual retinal display

   Other
      Autostereoscopy
      Flexible display
      Holographic display
          + Computer-generated holography
      Multi-primary color display
      Ultra HD
      Volumetric display

   Electronics
      Electronic nose
      E-textiles
      Flexible electronics
      Molecular electronics
      Nanoelectromechanical systems
      Memristor
      Spintronics
      Thermal copper pillar bump

   Energy
   Production
      Airborne wind turbine
      Artificial photosynthesis
      Biofuels
      Carbon-neutral fuel
      Concentrated solar power
      Fusion power
      Home fuel cell
      Hydrogen economy
      Methanol economy
      Molten salt reactor
      Nantenna
      Photovoltaic pavement
      Space-based solar power
      Vortex engine

   Storage
      Beltway battery
      Compressed air energy storage
      Flywheel energy storage
      Grid energy storage
      Lithium–air battery
      Molten-salt battery
      Nanowire battery
      Research in lithium-ion batteries
      Silicon–air battery
      Thermal energy storage
      Ultracapacitor

     Other
      Smart grid
      Wireless power

   Information and
   communications
      Ambient intelligence
          + Internet of things
      Artificial intelligence
          + Applications of artificial intelligence
          + Progress in artificial intelligence
          + Machine translation
          + Mobile translation
          + Machine vision
          + Semantic Web
          + Speech recognition
      Atomtronics
      Carbon nanotube field-effect transistor
      Cybermethodology
      Fourth-generation optical discs
          + 3D optical data storage
          + Holographic data storage
      GPGPU
      Memory
          + CBRAM
          + FRAM
          + Millipede
          + MRAM
          + NRAM
          + PRAM
          + Racetrack memory
          + RRAM
          + SONOS
      Optical computing
      RFID
          + Chipless RFID
      Software-defined radio
      Three-dimensional integrated circuit

   Manufacturing
      3D printing
      Claytronics
      Molecular assembler
      Utility fog

   Materials science
      Aerogel
      Amorphous metal
      Artificial muscle
      Conductive polymer
      Femtotechnology
      Fullerene
      Graphene
      High-temperature superconductivity
      High-temperature superfluidity
      Linear acetylenic carbon
      Metamaterials
          + Metamaterial cloaking
      Metal foam
      Multi-function structures
      Nanotechnology
          + Carbon nanotubes
          + Molecular nanotechnology
          + Nanomaterials
      Picotechnology
      Programmable matter
      Quantum dots
      Silicene
      Superalloy
      Synthetic diamond

   Military
      Antimatter weapon
      Caseless ammunition
      Directed-energy weapon
          + Laser
          + Maser
          + Particle-beam weapon
          + Sonic weapon
          + Coilgun
          + Railgun
      Plasma weapon
      Pure fusion weapon
      Stealth technology
      Vortex ring gun

   Neuroscience
      Artificial brain
      Brain–computer interface
      Electroencephalography
      Mind uploading
          + Brain-reading
          + Neuroinformatics
      Neuroprosthetics
          + Bionic eye
          + Brain implant
          + Exocortex
          + Retinal implant

   Quantum
      Quantum algorithms
      Quantum amplifier
      Quantum bus
      Quantum channel
      Quantum circuit
      Quantum complexity theory
      Quantum computing
      Quantum cryptography
      Quantum dynamics
      Quantum electronics
      Quantum error correction
      Quantum imaging
      Quantum information
      Quantum key distribution
      Quantum logic
      Quantum logic gates
      Quantum machine
      Quantum machine learning
      Quantum metamaterial
      Quantum metrology
      Quantum network
      Quantum neural network
      Quantum optics
      Quantum programming
      Quantum sensing
      Quantum simulator
      Quantum teleportation

   Robotics
      Domotics
      Nanorobotics
      Powered exoskeleton
      Self-reconfiguring modular robot
      Swarm robotics
      Uncrewed vehicle

   Space science
   Launch
      Fusion rocket
      Non-rocket spacelaunch
          + Mass driver
          + Orbital ring
          + Skyhook
          + Space elevator
          + Space fountain
          + Space tether
      Reusable launch system

   Propulsion
      Beam-powered propulsion
      Ion thruster
      Laser propulsion
      Plasma propulsion engine
          + Helicon thruster
          + VASIMR
      Nuclear pulse propulsion
      Solar sail

   Other
      Interstellar travel
      Propellant depot
      Laser communication in space

   Transport
   Aerial
      Adaptive compliant wing
      Backpack helicopter
      Delivery drone
      Flying car
      High-altitude platform
      Jet pack
      Pulse detonation engine
      Scramjet
      Spaceplane
      Supersonic transport

   Land
      Airless tire
      Alternative fuel vehicle
          + Hydrogen vehicle
      Driverless car
      Ground effect train
      Hyperloop
      Maglev train
      Personal rapid transit
      Transit Elevated Bus
      Vactrain
      Vehicular communication systems

   Pipeline
      Pneumatic transport
          + Automated vacuum collection

   Other
      Anti-gravity
      Cloak of invisibility
      Digital scent technology
      Force field
          + Plasma window
      Immersive virtual reality
      Magnetic refrigeration
      Phased-array optics

   Topics
      Collingridge dilemma
      Differential technological development
      Disruptive Innovation
      Ephemeralization
      Exploratory engineering
      Fictional technology
      Proactionary principle
      Technological change
          + Technological unemployment
      Technological convergence
      Technological evolution
      Technological paradigm
      Technology forecasting
          + Accelerating change
          + Moore's law
          + Technological singularity
          + Technology scouting
      Technology readiness level
      Technology roadmap
      Transhumanism

      Category Category
      List-Class article List

   Authority control Edit this at Wikidata
      GND: 4003966-3
      NDL: 00565743

   Retrieved from
   "https://en.wikipedia.org/w/index.php?titleMachine_translation&oldid8
   74635483"

   Categories:
      Artificial intelligence applications
      Computational linguistics
      Machine translation
      Computer-assisted translation
      Tasks of natural language processing

   Hidden categories:
      CS1 Japanese-language sources (ja)
      Webarchive template wayback links
      CS1 Spanish-language sources (es)
      CS1 maint: Multiple names: authors list
      Use dmy dates from July 2014
      Articles needing additional references from June 2008
      All articles needing additional references
      All articles with failed verification
      Articles with failed verification from December 2017
      All articles with unsourced statements
      Articles with unsourced statements from December 2018
      Articles with unsourced statements from February 2007
      Wikipedia articles with GND identifiers
      Wikipedia articles with NDL identifiers

Navigation menu


</file>
<file1-12.txt>
   alternate alternate alternate alternate alternate alternate alternate

   Live Chat

   IFRAME: //www.googletagmanager.com/ns.html?idGTM-NPXVFB

 
What is Machine Translation?

The translation of text by a computer, with no human involvement

   Learn about SDL Trados Studio Learn about SDL Language Cloud

Increase productivity and translate faster

   Using machine translation as part of the SDL Trados Studio environment,
   you can translate more content and deliver it faster than before.

   SDL Trados Studio software includes support for several machine
   translation engines.

What is machine translation?

   The translation of text by a computer, with no human involvement.
   Pioneered in the 1950s, machine translation can also be referred to as
   automated translation, automatic or instant translation.

How does machine translation work?

   There are three types of machine translation system: rules-based,
   statistical and neural:
      Rules-based systems use a combination of language and grammar rules
       plus dictionaries for common words. Specialist dictionaries are
       created to focus on certain industries or disciplines. Rules-based
       systems typically deliver consistent translations with accurate
       terminology when trained with specialist dictionaries.
      Statistical systems have no knowledge of language rules. Instead
       they "learn" to translate by analysing large amounts of data for
       each language pair. They can be trained for specific industries or
       disciplines using additional data relevant to the sector needed.
       Typically statistical systems deliver more fluent-sounding but less
       consistent translations.
      Neural Machine Translation (NMT) is a new approach that makes
       machines learn to translate through one large neural network
       (multiple processing devices modeled on the brain). The approach
       has become increasingly popular amongst MT researchers and
       developers, as trained NMT systems have begun to show better
       translation performance in many language pairs compared to the
       phrase-based statistical approach.

When would I use machine translation?

   When translating with SDL Trados Studio, any segments not leveraged
   from translation memory can automatically be machine translated for a
   translator to review, then accept and amend if necessary, or decide to
   manually translate instead. A translator can configure which machine
   translation to use and how much is used.

   Respecting client confidentiality

   If the projects you work on are commercially sensitive, your customer
   may require that information is not disclosed to any third parties.
   Carefully consider how and when to use machine translation as you could
   be sharing segments of the source with a third party.

   Audit files are automatically generated by SDL Trados Studio which
   record the use of machine translation.

What machine translation can I use?

   SDL Trados Studio supports a number of machine translation engines that
   are available over an internet connection including SDL Language
   Cloud which is provided by SDL. You can choose between various options
   starting with a free package of baseline (untrained) translation up to
   industry specific trained engines, we have an option that is right for
   you.

   You can use AdaptiveMT, our self-learning machine translation from
   within SDL Trados Studio. AdaptiveMT works via SDL Language Cloud MT
   and learns from your edits in real-time as you translate.

What are the benefits to translators?

   Increased productivity – deliver translations faster
      Pre-translate new segments that are not leveraged from translation
       memory.
      Connect to and use a customer's or supplier's trained engine
       through the industry specific solution SDL Language Cloud to
       achieve better quality results every time.

   Flexibility and choice – to suit all types of project
      Select from a number of different machine translation engines.
      Choose from over 100 languages and more than 2,500 language pairs
       to suit your project.
      The option to compare the results of rules-based and statistical
       machine translation engines.

Discover AdaptiveMT

   SDL AdaptiveMT is your own personal machine translation engine that
   adapts as you translate. Accessed directly within SDL Trados Studio,
   AdaptiveMT will learn from your post-edits in real-time to retain your
   style, tone and content, saving time and minimizing future
   post-editing.
   Learn more Learn more

Discover SDL Language Cloud

   Access secure industry engines or add your own terminology for
   high-quality output in SDL Trados Studio.
   Learn more Learn more

Are you looking for a way to not only improve the quality of your work but
speed up the process too?

      SDL Language Cloud has machine translation that can be personalised
       with your own terms. By offering both machine and human
       translation, Language Cloud combines your personal term dictionary
       with our industry specific, self-learning engines to result in high
       level translations.

      SDL AdaptiveMT learns from you as you translate and adapts to your
       translation style, content and terminology applying it in real time
       through SDL Trados Studio. The influence from your previous
       translations prevents you from having to mend the same issues
       repeatedly and s you to save time and money by minimising the
       amount of possible post-editing.

   Watch the two-minute introductory video.


</file>
<file1-13.txt>
   alternate alternate alternate alternate alternate alternate alternate
   alternate alternate


What is Machine Translation? Rule Based Machine Translation vs. Statistical
Machine Translation

   Machine translation (MT) is automated translation. It is the process by
   which computer software is used to translate a text from one natural
   language (such as English) to another (such as Spanish).

   To process any translation, human or automated, the meaning of a text
   in the original (source) language must be fully restored in the target
   language, i.e. the translation. While on the surface this seems
   straightforward, it is far more complex. Translation is not a mere
   word-for-word substitution. A translator must interpret and analyze all
   of the elements in the text and know how each word may influence
   another. This requires extensive expertise in grammar, syntax (sentence
   structure), semantics (meanings), etc., in the source and target
   languages, as well as familiarity with each local region.

   Human and machine translation each have their share of challenges. For
   example, no two individual translators can produce identical
   translations of the same text in the same language pair, and it may
   take several rounds of revisions to meet customer satisfaction. But the
   greater challenge lies in how machine translation can produce
   publishable quality translations.

Rule-Based Machine Translation Technology

   Rule-based machine translation relies on countless built-in linguistic
   rules and millions of bilingual dictionaries for each language pair.

   The software parses text and creates a transitional representation from
   which the text in the target language is generated. This process
   requires extensive lexicons with morphological, syntactic, and semantic
   information, and large sets of rules. The software uses these complex
   rule sets and then transfers the grammatical structure of the source
   language into the target language.

   Translations are built on gigantic dictionaries and sophisticated
   linguistic rules. Users can improve the out-of-the-box translation
   quality by adding their terminology into the translation process. They
   create user-defined dictionaries which override the system’s default
   settings.

   In most cases, there are two steps: an initial investment that
   significantly increases the quality at a limited cost, and an ongoing
   investment to increase quality incrementally. While rule-based MT
   brings companies to the quality threshold and beyond, the quality
   improvement process may be long and expensive.

Statistical Machine Translation Technology

   Statistical machine translation utilizes statistical translation models
   whose parameters stem from the analysis of monolingual and bilingual
   corpora. Building statistical translation models is a quick process,
   but the technology relies heavily on existing multilingual corpora. A
   minimum of 2 million words for a specific domain and even more for
   general language are required. Theoretically it is possible to reach
   the quality threshold but most companies do not have such large amounts
   of existing multilingual corpora to build the necessary translation
   models. Additionally, statistical machine translation is CPU intensive
   and requires an extensive hardware configuration to run translation
   models for average performance levels.

Rule-Based MT vs. Statistical MT

   Rule-based MT provides good out-of-domain quality and is by nature
   predictable. Dictionary-based customization guarantees improved quality
   and compliance with corporate terminology. But translation results may
   lack the fluency readers expect. In terms of investment, the
   customization cycle needed to reach the quality threshold can be long
   and costly. The performance is high even on standard hardware.

   Statistical MT provides good quality when large and qualified corpora
   are available. The translation is fluent, meaning it reads well and
   therefore meets user expectations. However, the translation is neither
   predictable nor consistent. Training from good corpora is automated and
   cheaper. But training on general language corpora, meaning text other
   than the specified domain, is poor. Furthermore, statistical MT
   requires significant hardware to build and manage large translation
   models.
   Rule-Based MT Statistical MT
   + Consistent and predictable quality – Unpredictable translation
   quality
   + Out-of-domain translation quality – Poor out-of-domain quality
   + Knows grammatical rules – Does not know grammar

   + High performance and robustness – High CPU and disk space
   requirements
   + Consistency between versions – Inconsistency between versions

   – Lack of fluency + Good fluency
   – Hard to handle exceptions to rules + Good for catching exceptions to
   rules

   – High development and customization costs + Rapid and cost-effective
   development costs provided the required corpus exists

   Given the overall requirements, there is a clear need for a third
   approach through which users would reach better translation quality and
   high performance (similar to rule-based MT), with less investment
   (similar to statistical MT).

Technology

      Why Use Language Translation Software? Benefits of Language
       Translation Software What Is Machine Translation? Pure Neural™
       Machine Translation: SYSTRAN's innovative neural engine SYSTRAN: 50
       Years of MT Innovation Advantages of SYSTRAN Technology SYSTRAN
       Customization Methodology

About SYSTRAN

      Company information Partners Investors News Careers

   
</file>
<file1-14.txt>
   IFRAME: //www.googletagmanager.com/ns.html?idGTM-MTPGF3

 
   [Language-Technology-3.png?itokTrh-N2ol]

What is Machine Translation?

   Machine translation (MT) refers to fully automated software that can
   translate source content into target languages. Humans may use MT to
    them render text and speech into another language, or the MT
   software may operate without human intervention.

   MT tools are often used to translate vast amounts of information
   involving millions of words that could not possibly be translated the
   traditional way. The quality of MT output can vary considerably; MT
   systems require “training” in the desired domain and language pair to
   increase quality.

   Translation companies use MT to augment productivity of their
   translators, cut costs, and provide post-editing services to clients.
   MT use by language service providers is growing quickly. In 2016,
   SDL—one of the largest translation companies in the world—announced it
   translates 20 times more content with MT than with human teams.

MT Systems

   Generic MT usually refers to platforms such as Google Translate, Bing,
   Yandex, and Naver. These platforms provide MT for ad hoc translations
   to millions of people. Companies can buy generic MT for batch
   pre-translation and connect to their own systems via API.

   Customizable MT refers to MT software that has a basic component and
   can be trained to improve terminology accuracy in a chosen domain
   (medical, legal, IP, or a company’s own preferred terminology). For
   example, WIPO’s specialist MT engine translates patents more accurately
   than generalist MT engines, and eBay’s solution can understand and
   render into other languages hundreds of abbreviations used in
   electronic commerce.

   Adaptive MT offers suggestions to translators as they type in their
   CAT-tool, and learns from their input continuously in real time.
   Introduced by Lilt in 2016 and by SDL in 2017, adaptive MT is believed
   to improve translator productivity significantly and can challenge
   translation memory technology in the future.

   There are over 100 providers of MT technologies. Some of them are
   strictly MT developers, others are translation firms and IT giants.
   Examples of MT Providers

   Google Translate

   Microsoft Translator / Bing
   SDL BeGlobal

   Yandex Translate

   Amazon Web Services translator
   Naver

   IBM - Watson Language Translator

   Automatic Trans
   BABYLON

   CCID TransTech Co.

   CSLi
   East Linden

   Eleka Ingeniaritza Linguistikoa

   GrammarSoft ApS
   Iconic Translation Machines

   K2E-PAT

   KantanMT
   Kodensha

   Language Engineering Company

   Lighthouse IP Group
   Lingenio

   Lingosail Technology Co.

   LionBridge
   Lucy Software / ULG

   MorphoLogic / Globalese

   Multilizer
   NICT

   Omniscien

   Pangeanic
   Precision Translation Tools (Slate)

   Prompsit Language Engineering

   PROMT
   Raytheon

   Reverso Softissimo

   SkyCode
   Smart Communications

   Sovee

   SyNTHEMA
   SYSTRAN

   tauyou

   Tilde
   Trident Software

   UTH International
   Worldlingo Based on a TAUS Report

MT Approaches

   There are three main approaches to machine translation:
      First-generation rule-based (RbMT) systems rely on countless
       algorithms based on the grammar, syntax, and phraseology of a
       language.
      Statistical systems (SMT) arrived with search and big data. With
       lots of parallel texts becoming available, SMT developers learned
       to pattern-match reference texts to find translations that are
       statistically most likely to be suitable. These systems train
       faster than RbMT, provided there is enough existing language
       material to reference.
      Neural MT (NMT) uses machine learning technology to teach software
       how to produce the best result. This process consumes large amounts
       of processing power, and that is why it’s often run on graphics
       units of CPUs. NMT started gaining visibility in 2016. Many MT
       providers are now switching to this technology.

   A combination of two different MT methods is called Hybrid MT.

Availability: API, Cloud, Server, Desktop

   Google, Microsoft, IBM, Amazon, Yandex, and many others run MT software
   on their own infrastructure and provide it as a Cloud API service,
   priced per symbol. For example, it costs $20 to translate 1 million
   characters with Google Translate. In contrast, developers of
   customizable MT, including Systran and Promt, offer server and desktop
   products priced per license.

   In professional translations, MT is most often integrated into the
   CAT-tool. The human linguist can pick a suggestion from MT as they go
   through the text, if they don’t find a better match in the translation
   memory.

Build Your Own MT Engine

   There are open-source toolkits anyone can use to build their own
   engines for any domain and language combination. The most popular
   baseline software are: Moses for SMT, OpenNMT for Neural and Apertium
   for RBT. Training statistical and neural engines requires a large
   collection of parallel texts in two languages. Some organizations such
   as TAUS have made a service out of providing baseline data, which
   companies can further expand by adding their own specialist
   translations.

Evaluating MT Quality

   Translation companies and departments typically evaluate MT quality by
   the effort it takes for a human to post-edit the output. It is often
   measured in pages per hour, or in the number of key strokes per
   segment.

   Specialists training MT engines rely on automated tests and metrics.
   They are better suited for A/B testing and experimentation and show the
   impact of the tiniest changes, where humans might not notice the
   difference.

   The mainstay metric for auto-testing is called BLEU. “Bilingual
   evaluation understudy (BLEU)” shows how closely MT translation
   corresponds to human translation of the same text. It compares parallel
   translations and produces a score between 0 (worst) and 1 (best). While
   BLEU scores are widely used by MT researchers, they can be manipulated,
   and it takes a specialist to make sense of results.

   Other MT quality metrics include METEOR, ROUGE, HyTER, and NIST.
   Quality metrics are the focus of the QT21 program supported by GALA.

Ethics for Translation Providers using MT

   Confidentiality - Content translated by free MT platforms such as
   Google Translate and Microsoft Translator is not confidential. It is
   stored by the platform owners and may be reused for later translations.

   Notifying the Client about MT Use - It’s a point of debate in the
   industry if a translation company should notify clients about use of MT
   on their projects. Many pundits are in favor of informing the customer
   of MT usage and others may not disclose the use of MT. Be sure to ask
   your provider if you have questions about MT usage.

   Read More Translation Technology Descriptions

   TAUS Machine Translation Market Report 2017

GALA Members

     
      iLen Technology logo
     
     
      RoundTable Studio logo
      Morningside Translations logo
     
     
      CETRA
     
     
      Tilde logo
     
     
      Spil Games Logo
     
     
      Middlebury Institute of International Studies at Monterey
      AUM Translation Services logo

   View all members

About GALA

   The Globalization and Localization Association (GALA) is a global,
   non-profit trade association for the language industry. As a membership
   organization, we support our member companies and the language sector
   by creating communities, championing standards, sharing knowledge, and
   advancing technology.

LEARN MORE

Join GALA

   GALA membership delivers measurable value to companies and their
   employees through access to professional development and expert
   resources, participation in industry initiatives, opportunities for
   visibility, connection with a network of industry peers, and discounts
   on events, software, and programs.

Join today


</file>
<file1-15.txt>


What is Machine Translation?

   Machine translation (MT) refers to fully automated software that can
   translate source content into target languages. Humans may use MT to
    them render text and speech into another language, or the MT
   software may operate without human intervention.

   MT tools are often used to translate vast amounts of information
   involving millions of words that could not possibly be translated the
   traditional way. The quality of MT output can vary considerably; MT
   systems require “training” in the desired domain and language pair to
   increase quality.

   Translation companies use MT to augment productivity of their
   translators, cut costs, and provide post-editing services to clients.
   MT use by language service providers is growing quickly. In 2016,
   SDL—one of the largest translation companies in the world—announced it
   translates 20 times more content with MT than with human teams.

MT Systems

   Generic MT usually refers to platforms such as Google Translate, Bing,
   Yandex, and Naver. These platforms provide MT for ad hoc translations
   to millions of people. Companies can buy generic MT for batch
   pre-translation and connect to their own systems via API.

   Customizable MT refers to MT software that has a basic component and
   can be trained to improve terminology accuracy in a chosen domain
   (medical, legal, IP, or a company’s own preferred terminology). For
   example, WIPO’s specialist MT engine translates patents more accurately
   than generalist MT engines, and eBay’s solution can understand and
   render into other languages hundreds of abbreviations used in
   electronic commerce.

   Adaptive MT offers suggestions to translators as they type in their
   CAT-tool, and learns from their input continuously in real time.
   Introduced by Lilt in 2016 and by SDL in 2017, adaptive MT is believed
   to improve translator productivity significantly and can challenge
   translation memory technology in the future.

   There are over 100 providers of MT technologies. Some of them are
   strictly MT developers, others are translation firms and IT giants.
   Examples of MT Providers

   Google Translate

   Microsoft Translator / Bing
   SDL BeGlobal

   Yandex Translate

   Amazon Web Services translator
   Naver

   IBM - Watson Language Translator

   Automatic Trans
   BABYLON

   CCID TransTech Co.

   CSLi
   East Linden

   Eleka Ingeniaritza Linguistikoa

   GrammarSoft ApS
   Iconic Translation Machines

   K2E-PAT

   KantanMT
   Kodensha

   Language Engineering Company

   Lighthouse IP Group
   Lingenio

   Lingosail Technology Co.

   LionBridge
   Lucy Software / ULG

   MorphoLogic / Globalese

   Multilizer
   NICT

   Omniscien

   Pangeanic
   Precision Translation Tools (Slate)

   Prompsit Language Engineering

   PROMT
   Raytheon

   Reverso Softissimo

   SkyCode
   Smart Communications

   Sovee

   SyNTHEMA
   SYSTRAN

   tauyou

   Tilde
   Trident Software

   UTH International
   Worldlingo Based on a TAUS Report

MT Approaches

   There are three main approaches to machine translation:
      First-generation rule-based (RbMT) systems rely on countless
       algorithms based on the grammar, syntax, and phraseology of a
       language.
      Statistical systems (SMT) arrived with search and big data. With
       lots of parallel texts becoming available, SMT developers learned
       to pattern-match reference texts to find translations that are
       statistically most likely to be suitable. These systems train
       faster than RbMT, provided there is enough existing language
       material to reference.
      Neural MT (NMT) uses machine learning technology to teach software
       how to produce the best result. This process consumes large amounts
       of processing power, and that is why it’s often run on graphics
       units of CPUs. NMT started gaining visibility in 2016. Many MT
       providers are now switching to this technology.

   A combination of two different MT methods is called Hybrid MT.

Availability: API, Cloud, Server, Desktop

   Google, Microsoft, IBM, Amazon, Yandex, and many others run MT software
   on their own infrastructure and provide it as a Cloud API service,
   priced per symbol. For example, it costs $20 to translate 1 million
   characters with Google Translate. In contrast, developers of
   customizable MT, including Systran and Promt, offer server and desktop
   products priced per license.

   In professional translations, MT is most often integrated into the
   CAT-tool. The human linguist can pick a suggestion from MT as they go
   through the text, if they don’t find a better match in the translation
   memory.

Build Your Own MT Engine

   There are open-source toolkits anyone can use to build their own
   engines for any domain and language combination. The most popular
   baseline software are: Moses for SMT, OpenNMT for Neural and Apertium
   for RBT. Training statistical and neural engines requires a large
   collection of parallel texts in two languages. Some organizations such
   as TAUS have made a service out of providing baseline data, which
   companies can further expand by adding their own specialist
   translations.

Evaluating MT Quality

   Translation companies and departments typically evaluate MT quality by
   the effort it takes for a human to post-edit the output. It is often
   measured in pages per hour, or in the number of key strokes per
   segment.

   Specialists training MT engines rely on automated tests and metrics.
   They are better suited for A/B testing and experimentation and show the
   impact of the tiniest changes, where humans might not notice the
   difference.

   The mainstay metric for auto-testing is called BLEU. “Bilingual
   evaluation understudy (BLEU)” shows how closely MT translation
   corresponds to human translation of the same text. It compares parallel
   translations and produces a score between 0 (worst) and 1 (best). While
   BLEU scores are widely used by MT researchers, they can be manipulated,
   and it takes a specialist to make sense of results.

   Other MT quality metrics include METEOR, ROUGE, HyTER, and NIST.
   Quality metrics are the focus of the QT21 program supported by GALA.

Ethics for Translation Providers using MT

   Confidentiality - Content translated by free MT platforms such as
   Google Translate and Microsoft Translator is not confidential. It is
   stored by the platform owners and may be reused for later translations.

   Notifying the Client about MT Use - It’s a point of debate in the
   industry if a translation company should notify clients about use of MT
   on their projects. Many pundits are in favor of informing the customer
   of MT usage and others may not disclose the use of MT. Be sure to ask
   your provider if you have questions about MT usage.

   Read More Translation Technology Descriptions

   TAUS Machine Translation Market Report 2017


</file>
<file1-16.txt>

   Logo Springer
   ____________________
   
   (BUTTON) Search Options
      Advanced Search
      Search 

    Search  Menu
   » Sign up / Log in
    English
      Deutsch

    Academic edition
      Corporate edition

   Skip to: Main content Side column
      Home
      Books A - Z
      Journals A - Z
      Videos
      Librarians

   Browse Volumes & Issues ____________________ 

Machine Translation

   ISSN: 0922-6567 (Print) 1573-0573 (Online)
   This journal was previously published under other titles (view Journal
   History)

Description

   Covers all branches of computational linguistics and language
   engineering, wherever they incorporate a multilingual aspect. It
   features papers that cover the theoretical, descriptive or
   computational aspects of any of the following topics:

   - compilation and use of bi- and multilingual corpora
   - computer-aided language instruction and learning
   - computational implications of non-Roman character sets
   - connectionist approaches to translation
   - contrastive linguistics
   - corpus-based and statistical language modeling
   - discourse phenomena and their treatment in (human or machine)
   translation
   - history of machine translation
   - human translation theory and practice
   - knowledge engineering
   - machine translation and machine-aided translation
   - minority languages
   - morphology, syntax, semantics, pragmatics
   - multilingual dialogue systems
   - multilingual information retrieval
   - multilingual information society (sociological and legal as well as
   linguistic aspects)
   - multilingual message understanding systems
   - multilingual natural language interfaces
   - multilingual text composition and generation
   - multilingual word-processing
   - phonetics, phonology
   - software localization and internationalization
   - speech processing, especially for speech translation

   Your article in Machine Translation?
    online via http://www.editorialmanager.com/coat/

   show all
   hide
   Browse Volumes & Issues

Latest Articles

    1. No Access
       OriginalPaper
       A user-study on online adaptation of neural machine translation to
       human post-edits
       Sariya Karimova, Patrick Simianer, Stefan Riezler (December 2018)
    2. No Access
       OriginalPaper
       Automatic quality estimation for speech translation using joint ASR
       and MT features
       Ngoc-Tien Le, Benjamin Lecouteux, Laurent Besacier (December 2018)
    3. No Access
       OriginalPaper
       Reassessing the proper place of man and machine in translation: a
       pre-translation scenario
       Julia Ive, Aurélien Max, François Yvon (December 2018)

   See all articles
   Machine Translation
      Available 1986 - 2018
      Volumes 32
      Issues 104
      Articles 576
      Open Access 8 Articles

Stay up to Date

      Article abstracts by RSS
       for journal updates

Find a Volume or Issue

   Volume
   ____________________ Please enter a valid issue and/or volume.
   Issue
   ____________________ Please enter a valid issue for volume.
    Find

Share

   Share this content on  Share this content on  Share this
   content on LinkedIn
   

About this Journal

   Journal Title
          Machine Translation

   Coverage
          Volume 1 / 1986 - Volume 32 / 2018

   Print ISSN
          0922-6567

   Online ISSN
          1573-0573

   Publisher
          Springer Netherlands

   Additional Links

          +  for Journal Updates
          + Editorial Board
          + About This Journal
          + Manuscript Submission

   Topics

          + Natural Language Processing (NLP)
          + Computational Linguistics
          + Artificial Intelligence

   Industry Sectors

          + Electronics
          + Telecommunications
          + IT & Software

   
Journal History

        Previous Title       Print ISSN Online ISSN
   Computers and translation 0884-0709  1573-0573


</file>
<file1-17.txt>
 _____________________________________________

       memoQ translator pro
       The CAT tool for individual translators
       memoQ project manager
       Translation project management made simple!
       memoQ translator free
       Free version of memoQ with limited features
       Teamwork
         ______________________________________________________________

       memoQ server
       For enterprises and translation companies
       memoQ cloud
       Advanced translation productivity in the cloud!
       QTerm
       The ultimate terminology management system
       Other solutions
         ______________________________________________________________
________________________________________

       The memoQ story
       What's the story behind memoQ? It all started in Hungary in 2004
       with three language technologists
       Leadership Team
       Meet our executive team, a group of passionate people leading the
       digital transformation in translation technology
       Events & Partnership
         ______________________________________________________________

       Events
       Our team travels around the world to meet with you. See where we
       are heading next!
       Academic Program
       A dynamic network of more than 250 universities spread over 5
       continents
       Association Program
       We work closely with translation associations all over the world
      Contact us
      News

   Menu memoq logo

What is Machine Translation?

   Machine translation (MT) or automated translation is the process by
   which computer software translates text from one language (such as
   English) to another (such as Spanish), with no human involvement.
   People use machine translation from Google and Microsoft for a wide
   number of situations and it seems to work well. You can certainly take
   advantage of machine translation, but you need to be careful not to
   jeopardize the overall quality of your translation project.

How does machine translation work?

   Machine translation works by large amounts of source and target
   language content being matched by a machine translation engine. There
   are different types of machine translation engines: rule-based,
   statistical and neural. Recently, there has been a lot of interest in
   neural machine translation engines. The reason for the excitement is
   that neural machine translation is providing better results with
   language pairs where there is less data and the output reads much
   better.


Content Privacy

   If your project has confidential material, you might want to avoid
   using some of the very popular engines. These technologies split each
   sentence into smaller segments and it could be difficult, if not
   impossible, to recreate the original. However, it may still be possible
   for someone to find this information online.


Involving human translators and reviewers is always necessary

   All machine translation engines make mistakes. The translation you get
   from the machine translation engine might be literally correct, but the
   tone, wording or  can be incorrect. If you just use machine
   translation without the supervision from translators or reviewers, you
   might get the phrase “Yo Dude!” being translated as if it were “Hello
   Sir”. Clearly, you want to avoid this.

MT plugins in memoQ

   This page introduces the various machine translation engines which
   offer memoQ plugins.
   Read more

memoQ translator pro

   memoQ translator pro is a computer-assisted translation tool which runs
   on Microsoft Windows operating systems. Download a free 30-day trial
   version!
   Try it for free!

Contact us!

   If you think that you could benefit from a CAT tool but would like to
   learn more about it contact us or give us a call!
   Send us a message

Machine translation in memoQ

   memoQ has integrations with 13 of the most popular machine translation
   engines. When translating, translators can see suggestions coming from
   the machine translation engines and use them if they feel they are
   applicable. This provides a good way to benefit from machine
   translation as the translator will ensure that the content of the
   localized version has the same style and feel as the original.
   When venturing into machine translation, you need to know that choosing
   an engine is not a simple task. You should think of different factors
   such as language pairs, content subject, among others.


Check this short video on Machine Translation in memoQ


   IFRAME:
   //www.youtube.com/embed/aTfwLLU265k?fs1&enablejsapi1&version3


</file>
<file1-18.txt>
 

Cross language barriers instantly with Tilde custom machine translation

   Customize machine translation systems for your language, your
   terminology, and your style.

   Use Tilde MT nowTry Tilde Translator

   chart boost translation productivity

Boost translation productivity

   The use of MT has been proven to  professional translators work 35%
   faster, raising efficiency.
   machine translation reach global audience

Reach global audiences

   Integrating MT into your online platform allows audiences to read
   content in their native language.
   machine translation access multilingual information

Access multilingual information

   MT enables organizations to analyze and access information from all
   over the world, in any language.

How can Tilde custom machine translation  you?

For Localization Service Providers

   The most innovative LSPs are turning to machine translation to 
   them meet the growing demand for localization. MT can not only boost
   productivity, but also  LSPs reduce costs and drive revenue. Start
   today!
   Learn More
   machine translation for localization service providers
   machine translation for business and enterprises

For Enterprise Users

   Machine translation s global business reach across language
   barriers to address consumers in all markets. Customers want to see
   information in their own native language. MT is the key to reaching
   global clients.
   Learn More

For Public Administrations

   Public administration use machine translation to enable access to
   e-services for all citizens and residents. Tilde is a recognized leader
   in developing MT services for EU government and organizations.

   Learn more
   machine translation for public administration

   FEATURES

Why Tilde MT? Check out our special features

   machine translation terminology integration

Terminology integration

   Boost your system's accuracy with terminology integration. This ensures
   that industry-specific terms are translated correctly and consistently.
   machine translation document translation

Full document translation

   Translate entire documents with the click of a button. Simply upload or
   drag/drop a document and a full translation is provides in seconds.
   It’s that easy.
   machine translation data library

Data library

   Don't have enough data? No problem. We can draw from our huge
   multilingual Data Library to improve your MT system's capabilities.
   machine translation tilde lets mt technology

Neural machine translation

   Neural machine translation produces more fluent,
   humanlike translations, substantially boosting the level of MT quality
   and accuracy.

   Show all features

   testimonials

What our clients are saying

     Tilde MT is among a "new breed of hosted MT providers" that has
     successfully "simplified access for small language companies [to MT]
     and enabled them to use it. Many have flocked to it since that time.
     Common Sense Advisory testimonial for Tilde

   Common Sense Advisory

     The site translated 2,750 stories last year, but it is working on
     making the translation process more efficient. One way it’s working
     to do that is through machine translations (..) with a Latvian
     company, Tilde. EurActiv (..) hopes the new technology will make the
     translation process three times faster.
     Harvard University testimonial for Tilde

   Nieman Journalism Lab

   Harvard University

     Tilde has developed the machine translation tool Hugo.lv, which has
     considerably improved the availability of e-government services of
     Latvia to the customers from Latvian, English and Russian language
     communities in Europe and the whole world.
     Minister of Foreign Affairs testimonial for Tilde

   Edgars Rinkevics

   Minister of Foreign Affairs, Republic of Latvia


</file>
<file1-19.txt>


Machine Translation


   by M. Kay

   At the end of the 1950s, researchers in the United States, Russia, and
   Western Europe were confident that high-quality machine translation
   (MT) of scientific and technical documents would be possible within a
   very few years. After the promise had remained unrealized for a decade,
   the National Academy of Sciences of the United States published the
   much cited but little read report of its Automatic Language Processing
   Advisory Committee. The ALPAC Report recommended that the resources
   that were being expended on MT as a solution to immediate practical
   problems should be redirected towards more fundamental questions of
   language processing that would have to be answered before any
   translation machine could be built. The number of laboratories working
   in the field was sharply reduced all over the world, and few of them
   were able to obtain funding for more long-range research programs in
   what then came to be known as computational linguistics.

   There was a resurgence of interest in machine translation in the 1980s
   and, although the approaches adopted differed little from those of the
   1960s, many of the efforts, notably in Japan, were rapidly deemed
   successful. This seems to have had less to do with advances in
   linguistics and software technology or with the greater size and speed
   of computers than with a better appreciation of special situations
   where ingenuity might make a limited success of rudimentary MT. The
   most conspicuous example was the METEO system, developed at the
   University of Montreal, which has long provided the French translations
   of the weather reports used by airlines, shipping companies, and
   others. Some manufacturers of machinery have found it possible to
   translate maintenance manuals used within their organizations (not by
   their customers) largely automatically by having the technical writers
   use only certain words and only in carefully prescribed ways.

Why Machine Translation Is Hard

   Many factors contribute to the difficulty of machine translation,
   including words with multiple meanings, sentences with multiple
   grammatical structures, uncertainty about what a pronoun refers to, and
   other problems of grammar. But two common misunderstandings make
   translation seem altogether simpler than it is. First, translation is
   not primarily a linguistic operation, and second, translation is not an
   operation that preserves meaning.

   There is a famous old example that makes the first point well. Consider
   the sentence:

   The police refused the students a permit because they feared violence.

   Suppose that it is to be translated into a language like French in
   which the word for 'police' is feminine. Presumably the pronoun that
   translates 'they' will also have to be feminine. Now replace the word
   'feared' with 'advocated'. Now, suddenly, it seems that 'they' refers
   to the students and not to the police and, if the word for students is
   masculine, it will therefore require a different translation. The
   knowledge required to reach these conclusions has nothing linguistic
   about it. It has to do with everyday facts about students, police,
   violence, and the kinds of relationships we have seen these things
   enter into.

   The second point is, of course, closely related. Consider the following
   question, stated in French: Ou voulez-vous que je me mette? It means
   literally, "Where do you want me to put myself?" but it is a very
   natural translation for a whole family of English questions of the form
   "Where do you want me to sit/stand/sign my name/park/tie up my boat?"
   In most situations, the English "Where do you want me?" would be
   acceptable, but it is natural and routine to add or delete information
   in order to produce a fluent translation. Sometimes it cannot be
   avoided because there are languages like French in which pronouns must
   show number and gender, Japanese where pronouns are often omitted
   altogether, Russian where there are no articles, Chinese where nouns do
   not differentiate singular and plural nor verbs present and past, and
   German where flexibility of the word order can leave uncertainties
   about what is the subject and what is the object.

The Structure of Machine Translation Systems

   While there have been many variants, most MT systems, and certainly
   those that have found practical application, have parts that can be
   named for the chapters in a linguistic text book. They have lexical,
   morphological, syntactic, and possibly semantic components, one for
   each of the two languages, for treating basic words, complex words,
   sentences and meanings. Each feeds into the next until a very abstract
   representation of the sentence is produced by the last one in the
   chain.

   There is also a 'transfer' component, the only one that is specialized
   for a particular pair of languages, which converts the most abstract
   source representation that can be achieved into a corresponding
   abstract target representation. The target sentence is produced from
   this essentially by reversing the analysis process. Some systems make
   use of a so-called 'interlingua' or intermediate language, in which
   case the transfer stage is divided into two steps, one translating a
   source sentence into the interlingua and the other translating the
   result of this into an abstract representation in the target language.

Frequently Asked Questions

      Why Major in Linguistics?
      How Many Languages Are There?
      Does Language Affect Thought?
      Why Is English Changing?

What We Do

   The mission of the LSA is to advance the scientific study of language.
   The LSA aspires to a world in which the essential nature of language
   and its central role in human life is well understood.
   » Read more about us

Sponsored Ad

Join Now

   Become a LSA Member

Members  Here


Linguistic Society of America : Advancing the Scientific Study of Language

</file>
<file1-2.txt>
 

Machine translation with post-editing

Benefitting from machine translation

      Immediate insight into texts in foreign languages regardless of
       volume and format
      Professional post-editing: manage the quality of automated
       translation depending on your requirements
      Translation delivery time and cost reduction of 30-50%


Our approach to machine translation

      MT solutions customized for particular company or industry
      MT post-editing services at a competitive rate
      Adjustable delivery models: from SaaS to on-premise
      Integration of automated translation into corporate content
       management systems and workflows
      Confidential translations

Delivering efficient MT solutions

   Our expertise includes working with three types of machine translation
   systems:
      Statistical (SMT) engines analyze the source language based on
       existing lexical resources, such as good quality translation
       memories and terminology databases, to select the most appropriate
       translations in the target language. Providing high quality data is
       paramount to ensure desired MT output.
      Rule-based (RBMT) engines analyze grammar in each segment and then
       invoke specific rules to derive target from source for a given
       language pair. These engines also analyze syntactic structure of
       the source sentence trying to adapt it to common target language
       patterns.
      Model-based (MBMT) engine that uses full semantic and syntactic
       analysis of source text prior to translation. This MT system is
       based on the patented ABBYY® Compreno® technology. It transforms
       strings of characters into data that “makes sense” to a computer
       and then generates the translation based on the meaning of the
       source text.

   We go several steps further by combining controlled language,
   translation memory and MT systems all of which are enhanced by ABBYY's
   own semantic, morphological and lexical analysis tools. This
   customization based on individual client's data ensures maximum quality
   of the output before it is sent to post-editing.

   Specific scenario (raw or post-edited MT) is always up to the client.
   In any case our job is to deliver MT solution that saves both time and
   money… and keeps operational overhead low!

Related solutions:

   SmartCAT
   ABBYY Legal Translation
   
   
</file>
<file1-20.txt>
 
   A lot of people have been talking about machine translation in customer
   service. It would make replying to tickets more efficient, improve the
   customer experience, and even  companies expand to different
   countries without having to hire native agents.

   However, most customer service leaders are skeptical about introducing
   this technology into their workflows—and for good reason.

   For most people, Google Translate is the first thing that comes to mind
   upon mention of machine translation (MT). But would you really trust it
   to translate everything you need to tell your customers?

   Probably not — considering mistakes like mistranslating the name of a
   Spanish food festival as “clitoris festival,” or identifying the phrase
   “Ooga Booga Wooga” as Somali. Nonetheless, this doesn’t make machine
   translation irrelevant to customer service. And here’s why.

   Even if you have the best customer support agents at your disposal,
   their ability to serve clients has one obvious limitation: language.
   So, what are your options if you need to provide customer support in
   markets with different languages?

   You can hire a bunch of native agents and train them (which is costly
   and time-consuming). Or you can automate translation (reducing costs
   and making your team more efficient).

   Imagine if your French speaking agents could seamlessly communicate
   with Chinese customers in their native language (in this case,
   Mandarin). Wouldn’t it be great? Or, if you could distribute
   multilingual customer support tickets equally among team members,
   regardless of what languages they speak, during peak season? Wouldn’t
   it be the holy grail of operational efficiency?

   The answer is yes, of course, it would. But there’s one significant
   detail that prevents most customer service managers from automating
   translation, and that is quality.

Machine translation quality — we’re gonna have to earn it

   CS operational managers (as well as most people) perceive translation
   quality from a single point of view: it must be perfect. On the other
   hand, in rapidly expanding businesses, language is nothing more than
   just a tool and its quality should be fit for purpose.

   So how do you make sure you have the highest quality translations
   without having to hire an international community of customer support
   agents to rival the Eurovision lineup?

   Well, that’s what we’re working on at Unbabel. We combine the best of
   machine translation with a community of tens of thousands of bilingual
   editors who review and approve the translations.

   And part of the reason why we involve humans in the process is because
   machine translation alone can’t yet deliver the quality we need. For
   machine translation to work, we need human translations to feed into
   the systems and train them. Once the system receives all the data, it
   starts to learn patterns and to produce better translations.

   But what if humans weren’t involved in this process? Would machine
   translation be enough for customer service?

   I doubt it. And let me tell you why.

   At Unbabel, we have translated crazy amounts of customer support
   messages for companies like Booking.com, EasyJet, Under Armour, and
   King, and if there’s one thing we know, it’s that machines make
   mistakes (some of which are not so easy to spot).

   Below are some of the most common mistakes made by machines in
   translations of customer service messages — mistakes that our community
   of editors have spotted and corrected.

1. Corrupted meaning — it’s a free-for-all

   No company likes to give things away for free. Needless to say it’d be
   bad for business if your translations left customers thinking that you
   do.

   Here’s an example of an actual translation which had to be reviewed and
   edited by our bilingual editors:

   Source (English): You recently notified us of the possibility that
   copyrighted material was being made available through our website.

   Machine translation (German): Sie haben uns vor Kurzem von der
   Überzeugung in Kenntnis gesetzt, dass urheberrechtlich geschütztes
   Material auf unserer Website kostenlos verfügbar ist. [You recently
   notified us of a belief that copyrighted material was being made
   available at no cost through our website.]

   The problem with this is that the word “available” was translated into
   German as “available at no cost”.

2. MT peculiarities — where am I?

   Some travelers learn to love the unexpected. But nobody wants to end up
   stranded in the wrong city because of a translation error.

   Source (Russian): Наш хостел расположен в деревне Туришкино, которая
   находится в 60 км от Санкт-Петербурга.
   [Our hostel is located in the village of Turishkino, which is 60 km
   away from St.Peterersburg]

   Machine translation (English): Our hostel is located in village
   Tururushkaino, which is 60 km away from St.Peterersburg.

   Since the neural machine translation system did not have the name of
   the village “Туришкино” in its lexical bank (to be fair, it’s a pretty
   rare word), it had to translate it into something else. Wrong
   translation, wrong city.

   This may also happen when you convert units of length:

   Source (English): If you live just 20 kilometres away from San Diego,
   you may consider driving to the Westfield Mission Valley mall and
   collecting it yourself.

   Machine translation (French): Si vous habitezseulement 20 milles de
   San Diego, vous pouvez envisager de vous rendre au centre commercial
   Westfield Mission Valleyde le récupérer vous-même.

3. MT hallucinations — the ghost of texts past

   Sometimes machines see things that aren’t actually there, haunted as
   they are by the memory of translations in their database. We like to
   call this phenomenon MT hallucinations.

   For instance, the machine may add unnecessary words to the translation,
   as in the example below:

   Source (English): The contract is understandable.

   Machine translation (French): Le contratcompréhensible, veuillez
   nous appeler dès que possible.
   [The contract is understandable, please call us as soon as possible.]

   In this case, what happened was that the MT system referred to previous
   translation examples and generated an extra clause which did not appear
   in the source text: “please call us as soon as possible”.

   But the MT system can also do the opposite and erase parts of the
   message:

   Source (English): It looks like it took a while for the subscription to
   be marked inactive but it is cancelled now.

   Machine translation (German): Es scheint, dass es eine Weile gedauert
   hat, bis das Abonnement als inaktiv markiert wurde.
   [It looks like it took a while for the subscription to be marked
   inactive.]

   In this case, the whole chunk of text “but it is canceled now” was not
   translated into German.

4.  and tone of voice — what did you call me?

   Languages have their own set of rules — that’s part of the reason why
   translation is so difficult. But when it comes to adapting  and
   tone of voice in customer service, you need to be extra careful with
   how you address people.

   Here’s a simple example of the incorrect use of a pronoun in machine
   translation:

   Source (English): Make sure you have the latest operating system on
   your device

   Machine translation (German): Stellst du sicher, dass du das neueste
   Betriebssystem auf deinem Gerät hast
   [Make sure you [informal] have the latest operating system on your
   [informal] device]

   The customer usually defines the choice of . However, the use
   of the inappropriate  (like the use of informal “du” instead of
   formal “Sie” in this example) can be a real threat when communicating
   with customers, who may see it as impolite.

5. Overtranslations — is that an off-brand?

   Some words are not supposed to be translated, like a company’s name or
   a person’s name. But machines don’t always know that.

   So, overtranslations such as this one are quite common:

   Source (English): I checked with the seller and as long as it not am
   Rapid Cheetah product, it is fine.

   Machine translation (German): Ich habe mit dem Verkäufer überprüft und
   solange es kein Schnellesgeparden produkt ist, ist es in Ordnung.
   [I checked with the seller and as long as it not a rapid cheetah
   product, it is fine.]

   Here, the brand’s name, “Rapid Cheetah,” is given as a literal
   translation in German. Sure, it’s funny, but it can also be confusing
   or even off-putting to customers.

6. Inconsistent or incorrect use of terminology — too many words

   One word may have different translations and you need to know exactly
   which one to use when communicating with your customers. And when
   things go wrong, it can look weird:

   Source (English): Packages 1 and 2 both charge a monthly fee, as these
   have additional features to Package 1.

   Machine translation (Dutch): Pakketten 1 en 2 vragen elk een
   maandelijks bedrag, omdat deze extra functies hebben voor Pakket 1.
   [Abonnements 1 and 2 both charge a monthly fee, as these have
   additional features to Abonnement 1.]

   In this example, the term “package” was required to be translated as
   “abonnement” and not “pakket“. I guess the MT system chose the wrong
   word.

   In short, pure machine translation systems lack the “human touch”
   required for understanding cultural references and contextual
   differences. Today, however, MT combined with advanced, automated
   quality assurance and post-editing by humans, ensures translations that
   are sound, and sound good — and often delivered within 20 minutes.

   This is a game-changer for customer service where it’s really not just
   a matter of quality but also speed. In a world where customers are not
   willing to wait more than 10 minutes to get their problems solved,
   attending to their needs in their native language on time is crucial.
   And here’s how machine translation can .

   Machine translation may not be at the end of its road, but it has come
   a long way toward meeting critical business needs. And this is just the
   beginning.
   Maxim Khalilov Maxim Khalilov

Maxim Khalilov

   Director of Applied Artificial Intelligence
     

   Maxim Khalilov, Ph.D., is the director of Applied AI at Unbabel, where
   he leads a team of engineers to develop AI technologies that meet a
   wide array of business needs. In 2012, he co-founded NLPPeople.com, a
   natural language processing firm. Maxim enjoys rock climbing on the
   weekends.

Customer Centric

   The weekly digest for the customer-obsessed

Read more articles like this one

   Maria Almeida

   March 5, 2018・4 min read

“Quality Estimation is everything that’s missing in Machine Translation.”

      Machine Translation
      Translation Quality
      Unbabel

   Mafalda Faria

   July 13, 2018・5 min read

Improving customer support through self-service. Here’s how to get it right.

      Customer Service
      FAQs
      Language

   Julie Belião

   July 26, 2018・11 min read

Hone your tone of voice. A linguistic perspective on how to talk to customers

      Customer Service
      Language
      Tone of Voice

      Contact
      Privacy Policy

      English
          + Português
          + Italiano
          + Français

     
   Building universal understanding
</file>
<file1-21.txt>
 
Machine Translation

      Machine Translation Andovar Academy cover

   Back to Andovar Academy

   Machine Translation (MT) is a technology that automatically translates
   text using termbases and advanced grammatical, syntactic and semantic
   analysis techniques.

   The idea that computers can translate human languages is as old as
   computers themselves. The first attempts to build such technology in
   the 1950s in the USA were accompanied by a lot of enthusiasm and
   significant funding. However, the first decade of research failed to
   produce a usable system and the now-famous report by Automatic Language
   Processing Advisory Committee (ALPAC) in 1966 found that the
   ten-year-long effort failed to fulfill expectations. The next time the
   general public heard of MT was likely in the late 1990s when the
   internet portal AltaVista launched a free online translation service
   called Babelfish. Although the quality was often lacking, it became
   immensely popular and brought MT into the limelight again. Other
   internet giants presented similar services soon after, the most
   well-known of which is now Google Translate.

   Despite great strides in technology and addition of dozens of new
   language pairs, these free services are usable for “gist” or casual
   translation, but usually not for commercial purposes. On the other
   hand, commercial providers of MT technology have worked on improving
   their paid offerings and with customization such Machine Translation
   engines are finding commercial use in limited areas. However,
   challenges with understanding context, tone, language s and
   informal expression remain the reason why MT is not expected to replace
   human translators in the foreseeable future. The main use cases for
   machine translation are applications that require real-time or near
   real-time interaction, for assimilating texts and “chat”, and as a
   productivity tool supporting human translators.

   Machine translation is not to be confused with Computer-Aided
   Translation (CAT) Tools.

What is MT Suitable for?

   The most common uses of MT technology are as follows:

   Gisting – The results of MT are generally not as good as translations
   produced by humans, but are useful for understanding roughly what a
   text says. Such translation may be good enough depending on the purpose
   and target audience.

   MT-human – In some cases, human translators edit machine translation
   results to produce final translations in what is called post-editing.

   Instant need – It can also be used for providing translations of
   materials that are time-sensitive and which cannot wait for the time
   required for human translation, such as results from database queries.

   Controlled language – For texts written in controlled language,
   customized MT engines can provide very high-quality translations, for
   example in translation of patents or technical specification sheets.

   High volume – Content producers are generating exponentially increasing
   volumes of material, and in many cases, human translation is simply not
   economically or technically feasible.

   Pseudotranslation – Localizers can use MT to translate source text to
   check for internationalization issues in the target languages before
   committing to professional translation.

   Support for human translators – Modern CAT tools allow users to
   translate source segments with MT. Translators can decide to use the
   results as they are or edit them manually, which can speed up their
   work.

Types of Machine Translation

Rule-Based Machine Translation (RBMT)

   RBMT, developed several decades ago, was the first practical approach
   to machine translation. It works by parsing a source sentence to
   identify words and analyze its structure, and then converting it into
   the target language based on a manually determined set of rules encoded
   by linguistic experts. The rules attempt to define correspondences
   between the structure of the source language and that of the target
   language.

   The advantage of RBMT is that a good engine can translate a wide range
   of texts without the need for large bilingual corpora, as in
   statistical machine translation. However, the development of an RBMT
   system is time-consuming and labor-intensive and may take several years
   for one language pair. Additionally, human-encoded rules are unable to
   cover all possible linguistic phenomena and conflicts between existing
   rules may lead to poor translation quality when facing real-life texts.
   For example, RBMT engines don’t deal well with slang or metaphorical
   texts. For this reason, rule-based translation has largely been
   replaced by statistical machine translation or hybrid systems, though
   it remains useful for less common language pairs where there are not
   enough corpora to train an SMT engine.

Statistical Machine Translation (SMT)

   SMT works by training the translation engine with a very large volume
   of bilingual (source texts and their translations) and monolingual
   corpora. The system looks for statistical correlations between source
   texts and translations, both for entire segments and for shorter
   phrases within each segment, building a so-called translation model. It
   then generates confidence scores for how likely it is that a given
   source text will map to a translation. The translation engine itself
   has no notion of rules or grammar. SMT is the core of systems used by
   Google Translate and Bing Translator, and is the most common form of MT
   in use today.

   The key advantage of statistical machine translation is that it
   eliminates the need to handcraft a translation engine for each language
   pair and create linguistic rule sets, as is the case with RBMT. With a
   large enough collection of texts, you can train a generic translation
   engine for any language pair and even for a particular industry or
   domain of expertise. With large and suitable training corpora, SMT
   usually translates well enough for comprehension. The main disadvantage
   of statistical machine translation is that it requires very large and
   well-organized bilingual corpora for each language pair. SMT engines
   fail when presented with texts that are not similar to material in the
   training corpora. For example, a translation engine that was trained
   using technical texts will have a difficult time translating texts
   written in casual style. Therefore, it is important to train the engine
   with texts that are similar to the material that will be translated.

Example-Based Machine Translation (EBMT)

   In an EBMT system, a sentence is translated by analogy. A number of
   existing translation pairs of source and target sentences are used as
   examples. When a new source sentence is to be translated, the examples
   are retrieved to find similar ones in the source, then the target
   sentence is generated by imitating the translation of the matched
   examples. Because the hit rate for long sentences is very low, usually
   the examples and the source sentence are broken down into small
   fragments.

   This approach may result in high-quality translation when highly
   similar examples are found. On the contrary, when there is no similar
   example found, the translation quality may be very low. EBMT has not
   been widely deployed as a commercial service.

Neural Machine Translation

   Neural machine translation (NMT) is based on the paradigm of machine
   learning and is the newest approach to MT. NMT uses neural networks
   that consist of nodes conceptually modeled after the human brain. The
   nodes can hold single words, phrases, or longer segments and relate to
   each other in a web of complex relationships based on bilingual texts
   used to train the system. The complex and dynamic nature of such
   networks allows the formation of significantly more educated guesses
   about the context and therefore the meaning of any word to be
   translated. NMT systems continuously learn and adjust to provide best
   output and require a lot of processing power. This is why this approach
   has only become viable in recent years.

Hybrid

   All the above have their shortcomings, and many hybrid MT approaches
   have been proposed. The two main categories of hybrid systems are:
      rule-based engines using statistical translation for post
       processing and cleanup,
      statistical systems guided by rule-based engines.
      either of the above with some input from neural machine translation
       system.

   In the first case, the text is translated first by a RBMT engine. This
   translation is then processed by an SMT engine, which corrects any
   errors made. In the second case, the RBMT engine does not translate the
   text but supports the SMT engine by inserting metadata (e.g.
   noun/verb/adjective, present/past tense, etc.)

   Almost all the practical MT systems adopt hybrid approaches to a
   certain extent, combining rule-based and statistical approaches. Most
   recently, more and more systems also take advantage of NMT to different
   degrees.

Measuring Quality of MT

   Measuring and benchmarking MT quality remains a difficult challenge.
   While standardized quality scales exist, they only provide a
   comparative and not absolute measure of quality. This is important
   because what’s really needed is an automated way to identify problem
   texts so they can be routed for human review and post-edit. At present,
   the standard practice is to have human reviews look at a certain
   percentage of texts, or spend an assigned amount of time reviewing a
   subset of a project.

   The most reliable method of MT quality evaluation requires human
   evaluators to score each sentence, either within text translated by an
   MT engine or in comparison with others. The average score on all the
   sentences from all evaluators is the final score. The most common
   metrics for human scoring are adequacy and fluency of translation.

   Human evaluation is expensive and time-consuming and thus unsuitable
   for frequent use during research and development of MT engines. Various
   automatic evaluation methods are available to measure similarity of MT
   translation and that from a human translator. Some examples:
      Word error rate (WER) is defined based on the distance between the
       system output and the reference translation at the word level.
      Position-independent error rate (PER) calculates the word error
       rate by treating each sentence as a bag of words and ignoring the
       word order.
      Bilingual Evaluation Understudy (BLEU) computes the n-gram
       precision rather than word error rate.
      Metric for Evaluation of Translation with Explicit Ordering
       (METEOR) takes stemming and synonyms into consideration.

   Automatic translation quality evaluation plays an important role in MT
   research since it s measure quality between iterations of an engine
   and between different engines. However, the correlation between
   automatic and human evaluation metrics is not satisfactory.

Related Posts

      Translator’s Style Guide
       Translator’s Style Guide

Translator’s Style Guide
       October 9th, 2015| 0 Comments
      Computer-Aided Translation (CAT) Tools
       Computer-Aided Translation (CAT) Tools

Computer-Aided Translation (CAT) Tools
       August 17th, 2015| 1 Comment
      Alignment of Translation
       Alignment of Translation

Alignment of Translation
       July 21st, 2015| 0 Comments
      XLIFF
       XLIFF

XLIFF
       July 14th, 2015| 0 Comments
      Translation Memory eXchange (TMX)
       Translation Memory eXchange (TMX)

Translation Memory eXchange (TMX)
       June 25th, 2015| 0 Comments
      Corpus
       Corpus

Corpus
       June 9th, 2015| 0 Comments
      Translation Memory
       Translation Memory

Translation Memory
       June 3rd, 2015| 0 Comments
      Controlled Language
       Controlled Language

Controlled Language
       May 29th, 2015| 0 Comments
      Terminology Management
       Terminology Management

Terminology Management
       May 22nd, 2015| 0 Comments
      Term Base eXchange (TBX)
       Term Base eXchange (TBX)

Term Base eXchange (TBX)
       May 12th, 2015| 0 Comments
      Translation Termbase
       Translation Termbase

Translation Termbase
       April 24th, 2015| 0 Comments

RESOURCE CENTER

   Blog
   White Papers
   Andovar Academy
   Languages in Focus

ABOUT ANDOVAR

   Company Background
   Key Staff
   Careers
   Get in Touch

OTHER SITES


   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-5HSTCW8
</file>
<file1-22.txt>
 

   ____________________ 
   [Translations-544-3.gif]
   POSTED ON AUG 31, 2018 TO AI Research
   Unsupervised machine translation: A novel approach to provide fast,
   accurate translations for more languages
   Marc'Aurelio Ranzato
   Guillaume Lample
   Myle Ott

   Automatic language translation is important to  as a way to
   allow the billions of people who use our services to connect and
   communicate in their preferred language. To do this well, current
   machine translation (MT) systems require access to a considerable
   volume of translated text (e.g., pairs of the same text in both English
   and Spanish). As a result, MT currently works well only for the small
   subset of languages for which a volume of translations is readily
   available.

   Training an MT model without access to any translation resources at
   training time (known as unsupervised translation) was the necessary
   next step. Research we are presenting at EMNLP 2018 outlines our recent
   accomplishments with that task. Our new approach provides a dramatic
   improvement over previous state-of-the-art unsupervised approaches and
   is equivalent to supervised approaches trained with nearly 100,000
   reference translations. To give some idea of the level of advancement,
   an improvement of 1 BLEU point (a common metric for judging the
   accuracy of MT) is considered a remarkable achievement in this field;
   our methods showed an improvement of more than 10 BLEU points.

   This is an important finding for MT in general and especially for the
   majority of the 6,500 languages in the world for which the pool of
   available translation training resources is either nonexistent or so
   small that it cannot be used with existing systems. For low-resource
   languages, there is now a way to learn to translate between, say, Urdu
   and English by having access only to text in English and completely
   unrelated text in Urdu – without having any of the respective
   translations.

   This new method opens the door to faster, more accurate translations
   for many more languages. And it may only be the beginning of ways in
   which these principles can be applied to machine learning and
   artificial intelligence.

   Word-by-word translation
   The first step toward our ambitious goal was for the system to learn a
   bilingual dictionary, which associates a word with its plausible
   translations in the other language. For this, we used a method we
   introduced in a previous paper, in which the system first learns word
   embeddings (vectorial representations of words) for every word in each
   language.

   Word embeddings are trained to predict the words around a given word
   using context (e.g., the five words preceding and the five words
   following a given word). Despite their simplicity, word embeddings
   capture interesting semantic structure. For instance, the nearest
   neighbor of “kitty” is “cat,” and the embedding of the word “kitty” is
   much closer to the embedding of “animal” than it is to the embedding of
   the word “rocket” (as “rocket” seldom appears in the context of the
   word “kitty”).

   Moreover, embeddings of words in different languages share similar
   neighborhood structure, because people across the world share the same
   physical world; for instance, the relationship between the words “cat”
   and “furry” in English is similar to their corresponding translation in
   Spanish (“gato” and “peludo”), as the frequency of these words and
   their context are similar.

   Because of those similarities, we proposed having the system learn a
   rotation of the word embeddings in one language to match the word
   embeddings in the other language, using a combination of various new
   and old techniques, such as adversarial training. With that
   information, we can infer a fairly accurate bilingual dictionary
   without access to any translation and essentially perform word-by-word
   translation.

   Two-dimensional word embeddings in two languages (left) can be aligned
   via a simple rotation (right). After the rotation, word translation is
   performed via nearest neighbor search.

   Two-dimensional word embeddings in two languages (left) can be aligned
   via a simple rotation (right). After the rotation, word translation is
   performed via nearest neighbor search.

   Translating sentences
   Word-by-word translation using a bilingual dictionary inferred in an
   unsupervised way is not a great translation — words may be missing, out
   of order, or just plain wrong. However, it preserves most of the
   meaning. We can improve upon this by making local edits using a
   language model that has been trained on lots of monolingual data to
   score sequences of words in such a way that fluent sentences score
   higher than ungrammatical or poorly constructed sentences.

   So, if we have a large monolingual data set in Urdu, we can train a
   language model in Urdu alongside the language model we have for
   English. Equipped with a language model and the word-by-word
   initialization, we can now build an early version of a translation
   system.

   Although it’s not very good yet, this early system is already better
   than word-by-word translation (thanks to the language model), and it
   can be used to translate lots of sentences from the source language
   (Urdu) to the target language (English).

   Next, we treat these system translations (original sentence in Urdu,
   translation in English) as ground truth data to train an MT system in
   the opposite direction, from English to Urdu. Admittedly, the input
   English sentences will be somewhat corrupt because of translation
   errors of the first system. This technique was introduced by R.
   Sennrichal. at ACL 2015 in the context of semisupervised learning
   of MT systems (for which a good number of parallel sentences are
   available), and it was dubbed back translation. This is the first time
   this technique has been applied to a fully unsupervised system;
   typically, it is initially trained on supervised data.

   Now that we have an Urdu language model that will prefer the more
   fluent sentences, we can combine the artificially generated parallel
   sentences from our back translation with the corrections provided by
   the Urdu language model to train a translation system from English to
   Urdu.

   Once the system has been trained, we can use it to translate many
   sentences in English to Urdu, forming another data set of the kind
   (original sentence in English, translation in Urdu) that can 
   improve the previous Urdu-to-English MT system. As one system gets
   better, we can use it to produce training data for the system in the
   opposite direction in an iterative manner, and for as many iterations
   as desired.

   Top: a sentence in English is translated to Urdu using the current
   En-Ur MT system. Next, the Ur-En MT system takes that Urdu translation
   as input and produces the English translation. The error between “cats
   are crazy” and “cats are lazy” is used to change the parameters such
   that the Ur-En MT system is more likely to output the correct sentence
   at the next iteration. Bottom: The same process in reverse, using the
   Ur-En MT system to provide data for the En-Ur MT system.

   The best of both worlds
   In our research, we identified three steps — word-by-word
   initialization, language modeling, and back translation — as important
   principles for unsupervised MT. Equipped with these principles, we can
   derive various models. We applied them to two very different methods to
   tackle our goal of unsupervised MT.

   The first one was an unsupervised neural model that was more fluent
   than word-by-word translations but did not produce translations of the
   quality we wanted. They were, however, good enough to be used as
   back-translation sentences. With back translation, this method
   performed about as well as a supervised model with 100,000 parallel
   sentences.

   Next, we applied the principles to another model based on classical
   count-based statistical methods, dubbed phrase-based MT. These models
   tend to perform better on low-resource language pairs, which made it
   particularly interesting, but this is the first time this method has
   been applied to unsupervised MT. In this case, we found that the
   translations had the correct words but were less fluent. Again, this
   method outperformed previous state-of-the-art unsupervised models.

   Finally, we combined both models to get the best of both worlds: a
   model that is both fluent and good at translating. To do this, we
   started from a trained neural model and then trained it with additional
   back-translated sentences from the phrase-based model.

   Empirically, we found that this last combined approach dramatically
   improved accuracy over the previous state-of-the-art unsupervised MT —
   showing an improvement of more than 10 BLEU points on English-French
   and English-German, two language pairs that have been used as a test
   bed (and even for these language pairs, there is no use of any parallel
   data at training time — only at test time, to evaluate).

   We also tested our methods on distant language pairs like
   English-Russian; on low-resource languages like English-Romanian; and
   on an extremely low-resource and distant language pair, English-Urdu.
   In all cases, our method greatly improved over other unsupervised
   approaches, and sometimes even over supervised approaches that use
   parallel data from other domains or from other languages.

   German-to-English translation examples show the results of each method:

   German-to-English translation examples show the results of each machine
   translation method

   German-to-English translation examples show the results of each machine
   translation method

   Beyond MT

   Achieving an increase of more than 10 BLEU points is an exciting start,
   but even more exciting for us are the possibilities this opens for
   future improvements. In the short term, this will certainly  us
   translate in many more languages and improve translation quality for
   low-resource languages. But the learnings gained from this new method
   and the underlying principles could go well beyond MT.

   We see potential for this research to be applied to unsupervised
   learning in any arena and potentially allowing agents to leverage
   unlabeled data and perform tasks with very few, if any, of the expert
   demonstrations (translations, in this case) that are currently
   required. This work shows that it is at least possible for the system
   to learn without supervision and to build a coupled system in which
   each component improves over time in a sort of virtuous circle.

Related

   IFRAME:
   https://www.facebook.com/plugins/like.php?hrefhttps://code.fb.com/ai-r
   esearch/unsupervised-machine-translation-a-novel-approach-to-provide-fa
   st-accurate-translations-for-more-languages/&width450&layoutstandard&
   actionlike&sizesmall&show_facesfalse&sharetrue&height35&appId2502
   08035192


</file>
<file1-23.txt>
 
Traduction linguistique précisenaturelle

   Démarrez avec Amazon Translate

   Amazon Translate estservice de traduction automatique neuronale
   offranttraductions linguistiques rapides, abordables et
   d'excellente qualité. La traduction automatique neuronaleune
   méthode de traduction automatique qui exploitemodèles de deep
   learning pour générer une traduction plus fluideplus naturelle que
  algorithmes de traduction traditionnels, basés surstatistiques
 règles. Amazon Translate vous permet de localiser du contenu
   (sites Webapplications) pourutilisateurs internationauxde
   traduire facilement de gros volumes de texte.
   Sommet AWS San Francisco 2018 – Amazon Translatemaintenant
   disponible pour tous

Avantages

Extrême précisionapprentissage continu

Intégration facilevos applications

Personnalisable

Scalable

   Amazon Translate estservice de traduction automatique.moteurs
   de traduction apprennent en permanence grâce àensembles de données
   nouveauxétendus afin de produiretraductions plus précises pour
  large éventail de cas d'utilisation.

   Amazon Translate simplifie la création de capacités de traduction en
   temps réelpar lot dans vos applicationsl'aide d'un simple appel
   d'API. Cela vous permet de localiser facilement une application ou un
   site Web, ou de traiterdonnées multilingues au sein de vos flux de
   travail existants.

   Amazon Translate vous permet de définir comment vos noms de marque,
   noms de personnage, noms de modèleautres termes uniques sont
   traduitsl'aide de la fonctionnalité terminologie personnalisée. La
   possibilité de personnaliserrésultats avec la terminologie
   personnalisée peut réduire le nombre de traductionsmodifier par des
   traducteurs professionnels, ce qui entraîneéconomies de coûts et
  traductions plus rapides.

   Qu'il s'agisse de quelques mots ou de grands volumes de texte, Amazon
   Translate s'adapte facilementvos besoins en matière de traduction.
   Le service assuretraductions rapidesfiables, quel que soit le
   volumedemandes de traduction que vous lui soumettez.

Cas d'utilisation de la traduction automatique

Analyse du sentiment multilingue du contenumédias sociaux

Fourniture de traductionsla volée de contenus générés parutilisateurs

Ajout d'un service de traduction en temps réel pourapplications de
communication

   Avec Amazon Translate, vous n'êtes pas limité par la barrière de la
   langue. Comprenez le sentiment social de votre marque, produit ou
   service tout en surveillantconversations en ligne dans différentes
   langues. Traduisez simplement le texte en anglais avant d'utiliser une
   application de traitement du langage naturel (NLP) telle qu'Amazon
   Comprehend pour analyser le contenu textuel dans une multitude de
   langues.

   Iltrès difficile pouréquipes de traduction humaines de suivre
   l'évolution du contenu dynamique ou en temps réel. Avec Amazon
   Translate, vous pouvez facilement traduire d'importants volumes de
   contenus générés parutilisateurs en temps réel.sites Web et
  applications peuvent automatiquement créercontenu, pour
   alimenterarticles,descriptions de profilsdes
   commentaires, dans la langue de l'utilisateur, en cliquant sur le
   bouton « Traduire ».

   Amazon Translate peut fournir une traduction automatique pour permettre
  communications interlinguistiques entre utilisateurs pour vos
   applications. En ajoutant une fonction de traduction en temps réel à
  applications de discussion, de messagerie, d'assistance technique
  de tickets,agent ouemployé anglophone peut communiquer avec
  clients dans plusieurs langues.

Clients Amazon Translate

   DigitalGlobe

     « Chez Hotels.com, nous nous engageonsoffrirnos clients les
     informationsplus pertinentesles plus récentes sur leur
     destination. Pour ce faire, nous gérons 90 sites Web localisés dans
     41 langues. Nous comptons plus de 25 millions d'avis de clients dont
     le nombre ne cesse d'augmenter chaque jour, ce qui fait de nos sites
    candidats idéaux pour la traduction automatique. Nous avons
     évalué Amazon Translateplusieurs autres solutions,d'après
     nous, Amazon Translate estservice rapide, efficacesurtout
     précis. Nous souhaitons profiterdernières avancées en matière
     d'apprentissage automatiquede la transition versmoteurs
     neuronaux pour personnaliserlocaliser davantageavis, et
     améliorer globalement l'expérience de nos clients. Amazon Translate
     estpas en avant dans cette direction. »


   Matthew Fryer
   – vice-présidentresponsable scientifiquedonnées, Hotels.com
   Thomson Reuters

     «entreprises numériques d'aujourd'hui subissentpressions
     pour produire toujours plus de contenu, plus rapidementavec plus
     de pertinence.traducteurs humains armés de la traduction
     automatique aidententrepriseslocaliser plus de contenu, plus
     rapidement,moindre coûtdans plus de langues. D'après notre
     expérience, en associant Amazon Translate àéditeur humain, nous
     pensons pouvoir générer jusqu'à 20% d'économies. »


   Ken Watson
   CTO chez Lionbridge
   ZipRecruiter

     « En utilisant nos servicesnotre technologie,entreprises
     mondiales peuvent localiser rapidementquantités massives de
     contenu tout en conservant une haute qualité. Nous sommes ravis des
     premiers résultats que nous avons obtenus avec Amazon Translate sur
    projet de traduction que nous avons lancé pour notre client,
     iHerb. Le délai d'exécution global a été réduit de 67 %, tout en
     maintenantmêmes normes de qualité élevées. Nos coûts totaux ont
     été réduits en proportion, ce qui nous permet d'offrirnos clients
    prix encore plus compétitifs. »


   Ofer Shoshan
   PDG de One Hour Translation
   Intuit

   « Chez Isentia, nous avons conçu notre logiciel de renseignements
   médiatiques dans une seule langue. Pour élargir nos capacités et
   répondre aux divers besoins linguistiques de nos clients, nous avions
   besoin d'une aidela traduction pour générertransmettre des
   informations précieusespartir de contenus médiatiques publiés dans
   une langue autre que l'anglais. Après avoir essayé de nombreux services
   de traduction automatique, nous avons été impressionnés par la facilité
   avec laquelle Amazon Translate s'intègre dans notre pipelinepar sa
   capacités'adapter, quel que soit le volume généré.traductions
   sont également plus précisesplus nuancées,respectentnormes
   élevées pour nos clients. »
   Andrea Walsh
   Directeursystèmes d'information chez Isentia

   En savoir plus surfonctions d'Amazon Translate
   Consultez la pagefonctions
   Prêtconcevoir ?
   Démarrez avec Amazon Translate
   D'autres questions ?
   Contactez-nous

   Créercompte gratuit
       
       
       Podcast
       Twitch
       Blog AWS
       Flux d’actualité RSS
       Misesjour par e-mail


</file>
<file1-24.txt>
   
   Locations mentioned in global news coverage monitored by GDELT
   2015-2018, colored by the primary language of coverage mentioning each
   locationKalev Leetaru

   Imagine a world without language barriers, where anyone can access
   real-time information from anywhere in the world in any language,
   seamlessly translated into their native tongue and where their own
   writings are equally accessible to speakers of all the world’s
   languages. Such has been the dream of science fiction writers since
   time immemorial, in which mass machine translation eliminates barriers
   to information access and communication and creates a post-lingual
   society. Yet, even as the digital world increasingly eliminates
   geographic barriers and makes it possible to hear from an ever-greater
   portion of the world’s citizenry, language barriers mean much of the
   world’s information remains inaccessible.

   The most basic approach to searching across languages is simply to
   translate keyword searches from one language to another, either through
   a preexisting translation reference or through machine translation.
   Unfortunately, the differences between languages can mean that a word
   in one language can translate into dozens of equivalents in another,
   turning a simple one-word search into a massive Boolean query.
   Traditional machine translation systems are not typically able to
   readily provide the complete list of every possible translation of a
   word from one language to another. For example, type “New York” into
   Google Translate and you’ll get New York back, while Bing Translate
   will offer New Yorgis. In reality, New York can be translated fourteen
   different ways into Estonian: “New York”, “New Yorki” , “New Yorgi”,
   “New Yorgisse”, “New Yorgis”, “New Yorgist”, “New Yorgile”, “New
   Yorgil”, “New Yorgilt”, “New Yorgiks”, “New Yorgini”, “New Yorgina”,
   “New Yorgita” and “New Yorgiga.”

   This means robustly searching for a given word or phrase in another
   language will often require the assistance of a person with native
   fluency in order to craft the appropriate queries.

   Moreover, if the goal is to offer more than basic keyword searches,
   then any natural language processing algorithms will need to be
   designed to handle every single language of interest. Unfortunately,
   the dearth of training data for all but a handful of languages means
   that few algorithms or tools are available for most of the world’s
   languages.

   The result is that despite having digital access to an almost
   unimaginable wealth of knowledge from across the planet, we rarely see
   the world beyond that captured in our own language. This can have
   devasting consequences, from missing the eariest warning signs of
   epidemics to narrowing our understanding of terrorism. When we read
   about conflicts or narratives in countries that speak languages other
   than our own, we see those stories only through the lens of those like
   us. We’re never able to actually see the world through the eyes of
   others.

   In contrast, what might it look like to invert this process? Imagine
   using massive machine translation to live translate an ever-growing
   fraction of worldwide news coverage in realtime. Within seconds of a
   news article being published somewhere on earth, it has been machine
   translated into an intermediate semantic structure that captures its
   meaning in a language-agnostic form, with a live-updating language
   model used to generate translations of the article into any language of
   interest. Now, keyword searches in a given language can be used to
   directly search the machine translations of worldwide coverage into
   that language, ensuring that a search in English for “New York” will
   return any Estonian language article using any of the 14 forms above,
   which the machine translation process will have converted to “New
   York.” Similarly, natural language processing algorithms can operate in
   their existing languages by simply processing the translated results in
   the language they are designed for. Thus, any algorithm available for
   English language content can be applied directly to the English machine
   translations of coverage from any other language, making all the
   world’s algorithms available for any language.

   Such was the goal of my open data GDELT Project’s Translingual
   initiative that launched almost four years ago. Unlike traditional
   machine translation efforts that merely translate single documents on
   demand, the goal of Translingual is to translate the world’s news
   coverage in realtime, seconds after it is published, from 65 languages
   (soon to be over 100) representing up to 98.4% of non-English online
   news coverage. Every article is translated into English using an
   iterative contextual clarification process akin to true translation,
   rather than the mere “interpretation” that we associate with machine
   “translation” today. Natural language processing algorithms natively
   designed for each language are run on the original content as-is, but
   the English translations allow GDELT to uniformly run the same
   algorithms across every news article, regardless of its original source
   language, essentially bridging the linguistic divide when it comes to
   automated text mining.

   To understand the critical importance of machine translation in
   understanding the world around us, the map below shows every distinct
   location that GDELT identified a mention of in the more than 7.1
   billion geographic references across 850 million worldwide news
   articles it monitored 2015-present.

   Locations mentioned in global news coverage monitored by GDELT
   2015-2018Kalev Leetaru

   The map below colors each of those points by the most common language
   of news coverage mentioning it (via the 65 languages GDELT currently
   translates from). While news coverage in languages across the world
   likely mention Paris, France at least once in the course of a year, the
   city is most commonly mentioned in French language news coverage,
   reflecting the geographic locality of journalism.

   Locations mentioned in global news coverage monitored by GDELT
   2015-2018, colored by the primary language of coverage mentioning each
   locationKalev Leetaru

   Perhaps most readily apparent in this map is how little of the world’s
   surface is covered primarily by the English language press. In other
   words, to truly understand the local stories and narratives across the
   world, you must look beyond English to local sources in local
   languages. The rich colorful vibrancy of this map reminds us of just
   how diverse our shared world is and how much information we miss by
   focusing only on the language(s) we ourselves speak.

   Leveraging this model, companies are increasingly combining this mass
   machine translation approach with selective machine translation to
   expand their reach into local events and narratives.

   In the end, putting this all together, we live in an era where machine
   translation, while far from perfect, is both scalable and accurate
   enough to allow us to machine translate the world’s news coverage in
   realtime, enabling language agnostic searching and data mining. As
   machine translation continues to improve at an exponential rate, we are
   increasingly able to see the world through the eyes of others.

   Based in Washington, DC, I founded my first internet startup the year
   after the Mosaic web browser debuted, while still in eighth grade, and
   have spent the last 20 years working to reimagine how we use data to
   understand the world around us at scales and in ways never before...
   MORE
      Print
      Site Feedback
      Tips
      Corrections
      Reprints & Permissions
      Terms
      Privacy

      2019 Forbes Media LLC. All Rights Reserved.
      AdChoices
</file>
<file1-25.txt>
 

Neural Machine Translation (NMT)

   /Neural Machine Translation (NMT)

Traduction automatique Neuronale – Neural Machine Translation (NMT) :
l’intelligence artificielle au service de la traduction

      Qu'est ce que la traduction automatique neuronale ?

   Ubiqus traduction automatique IA

   La traduction automatique neuronale ou Neural Machine Translation (NMT)
  une technologie basée surréseaux de neurones artificiels. Elle
   a faitprogrès considérables ces dernières années grâce à
   l’intelligence artificiellepeut désormais servir de base pour
   certaines traductions professionnelles.

   La traduction automatique neuronale permet de traduire, en temps réel,
  millions d’informations avec une précisionune fiabilité
   désormais proche de celles d’un être humain.
   Si, dans notre quotidien, nous sommes déjà familierslogiciels de
   traduction automatique comme Google Translate, l’Intelligence
   Artificielle vient bouleverser la donne.

   La machine, comme le cerveau humain,en effet désormais capable de
   restituer une traduction fiable mais aussi d’apprendre une langue, et
   donc d’améliorer constamment la qualitééléments traduits.Pour
   accroîtreperformances de la « machine », cette dernière est
   entraînée par l’homme. Concrètement, cela revientalimenter la
   machine avectrès grand volume de données de qualité (mots, segments
   de phrasestextes déjà traduits) afin d’améliorer la fiabilitéla
   finesserésultats.

   Une « machine » peut également être entraînée pour répondre aux besoins
   spécifiques d’un secteur (traduction juridique, traduction médicale
   etc.) ou pour le métier d’un client, avecvocabulaire métier propre.

   NMT - Fonctionnement - Open NMT Communauté

Vous disposez de données dans une langue étrangèrevous souhaitez obtenir
rapidement leur traduction ?

   > consultez-nous pour réaliseraudit de vos données

      L'alliance de la technologiede la traduction humaine chez
       Ubiqus

   Si, pendant longtemps, la traduction humainela traduction
   automatique ont été opposées, il convient désormais deassocier.

   Pour garantirtraductions de qualité, ilnécessaire d’adapter
  de faire relire le contenu traduit automatiquement partraducteur
   professionnel.  Cette étape de vérification, d’adaptationde
   correction s’appelle la post-édition.  Elle viserendre le contenu
   final intelligiblefluide.

   Ubiqus NMT alliance technologiehumain En savoir plus sur la
   traduction automatique

      Notre offre de traduction automatique
      Notre agence de traduction

  
</file>
<file1-26.txt>
   (BUTTON) Toggle navigation
   Machine translation
      Overview (current)
      Coursework
      Syllabus

Machine translation

   [artsrouni.jpg]
   Georges Artsrouni's mechanical brain, a translation device patented in
   1933 in France.
     __________________________________________________________________

      Overview (current)
      Coursework
      Syllabus

   This page is for the 2018 offering of this course, and is here for
   archival purposes. This course will no longer be offered. Instead, it
   will be merged with Natural language understanding and Natural language
   generation into a new 20-point second-semester NLP course: Natural
   language understanding, generation, and machine translation.

Course Description

   Google translate instantly translates between any pair of over eighty
   human languages like French and English. How does it do that? Why does
   it make the errors that it does? And how can you build something
   better? Modern translation systems like Google Translate, learn to
   translate by reading millions of words of already translated text. This
   course will show you how they work. We cover fundamental building
   blocks from machine learning, computer science, and linguistics,
   showing how they apply to a real and difficult problem in artificial
   intelligence.

Time and Place

      Mondays 16:10 to 17:00, Medical School, Room 425 Anatomy Lecture
       Theatre - Doorway 3
      Thursdays 16:10 to 17:00, Medical School, Room 425 Anatomy Lecture
       Theatre - Doorway 3

Teaching Team

      Rico Sennrich (Office hours: 3:00 Mondays, Absorb Cafe, starting
       week 3)
      Alham Aji
      Jonathan Mallinson
      Ida Szubert
      Denis Emelin

   Ask us questions on piazza. But answer questions too.

Textbook

   There is no required textbook. The course will draw on recent
   literature from this fast-moving field. However, some background will
   be drawn from the following books.
      Neural Machine Translation by Philipp Koehn. Available online.
      Deep Learning by Goodfellow, Bengio, and Courville. Available
       online.
      Linguistic Fundamentals for Natural Language Processing by Emily
       Bender. Available electronically from the university library.

Assessment

   The assessment will consist of:
      A practical course work assignment, due in week 8 (30%). You are
       encouraged to work in pairs.
      a final exam in the April/ May diet (70%): April 30th, 14:30-16:30,
       Appleton Tower Concourse.

   The course will follow the school-wide late coursework policy and
   academic conduct policy. Past exam papers are available here.

Prerequisites

   The course assumes you have taken ANLP or equivalent. Machine
   translation applies concepts from computer science, statistics, and
   linguistics. You needn’t be an expert in all three of these fields (few
   people are), but if you are allergic to any of them you should not take
   this course. Concretely, you will be expected to already understand the
   following topics before taking the course, or be prepared to learn them
   independently.
      Discrete mathematics: analysis of algorithms, dynamic programming,
       basic graph algorithms.
      Other essential maths: basic probability theory; basic calculus and
       linear algebra; ability to read and manipulate mathematical
       notation including sums, products, log, and exp.
      Programming: ability to read and modify python programs; ability to
       design and implement a function based on high-level description
       such as pseudocode or a precise mathematical statement of what the
       function computes.
      Linguistics: basic elements of linguistic description.

Course catalogue

      University: INFR11062
      Informatics: MT
     __________________________________________________________________

   Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
   Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
   school-office@inf.ed.ac.uk
   Please contact our webadmin with any comments or corrections. Logging
   and Cookies
   Unless explicitly stated otherwise, all material is copyright  The
   University of Edinburgh.
   Material on this page is freely reuasable under a Creative Commons
   attribution license,
   and you are free to reuse it with appropriate credit. The website is
   based on source code by Adam Lopez, available on github.
</file>
<file1-27.txt>
   vCard publisher Pangeanic.com » Feed Pangeanic.com » Comments Feed
   alternate alternate

   Quantcast


Machine Translation Technology

   Home › Translation Technology › Machine Translation Technology

Pangeanic was the first translation company in the world to make commercial
use of the statistical machine translation system Moses as reported at the
Association for Machine Translation in the Americas (AMTA) in 2010 and the
European Union project Euromatrixplus. Nowadays, Pangeanic’s neural machine
translation engines are first-of-class and have been chosen by US government
agencies and the European Union and Member States, as well as many
translation companies.

   Dozens of corporations, businesses and language service providers, have
   benefited by a flexible approach that is user-centric and provides the
   highest levels of control, customization and ownership to the users.
   Pangeanic has developed and used machine translation for many
   applications. It has reported successful use cases for many of its
   clients at industry events like Localization World Barcelona 2011,
   Localization World Paris 2012, Localization World London 2013, as well
   as numerous TAUS summits in the United States, Europe and in Japan,
   META Forum Berlin 2013 and Japan Translation Federation.
   Pangeanic's Syntax-Based Hybrid Machine Translation

   Pangeanic’s Syntax-Based Hybrid Machine Translation

   Pangeanic was also one of the largest donors of training data to TAUS,
   which in turn provided access to millions of words as training corpus.
   This enhanced PangeaMT platform and provided our team with the
   opportunity to experiment further, with millions and millions of
   aligned sentences. Machine translation became part of company culture
   since 2009. Since then, machine translation services to corporations
   and even other translation companies have become part of Pangeanic’s
   range of services. From 2012 to 2016, Pangeanic has been a member of
   the EU’s Marie Curie action EXPERT Project, advancing the
   state-of-the-art with young and experienced researchers. PangeaMT is
   Pangeanic’s own, independent translation technology division with a
   clear focus on customized, domain-specific Machine Translation (MT).
   The current version of the platform is v3. EuroMatrixplus EXPERT
   project translation technologies and machine translation

HISTORY

   As a forward-thinking and technology-savvy translation company,
   Pangeanic wins a post-editing contract in 2007 to work for the European
   Commission as MT output post-editors. It is at this time when we become
   acquainted with institutional user needs and (re-)evaluated several
   commercial MT products we had been using. Soon we decided to develop
   our own machine translation technology. Pangeanic was quoted as the
   first language service provider to make commercial use of Moses in EU’s
   Framework development program euromatrixplus.net (the second, more
   perfected release of Moses).  Since then, many presentations, awards
   and implementations have followed, and Pangeanic has made a name for
   itself as a leading machine translation implementation company. It also
   markets its machine translation services in other areas beyond the
   translation industry and is heavily involved in two more EU machine
   translation R&D programs, EXPERT and Casmacat (User Group).

   Pangeanic obtained the biggest contract for machine translation
   infrastructures for the European commission (2017) with its iADAATPA
   project. Neural machine translation technology has been integrated in
   Pangeanic’s workflow to benefit its clients with faster translation
   turnarounds. Neural networks-based engines also serve EU projects, US
   government agencies and international companies on the cloud and
   on-premise.

FOCUS

   We began as keen followers of the statistical-driven paradigm of
   machine translation. This worked very well in several related languages
   (Romance languages and English, German and Scandinavian languages).
   However, our links to Japanese industry soon provided requests to add
   Japanese and Chinese to our service portfolio. In 2011, Pangeanic
   developed hybrid machine translation services which were included as
   part of the system features.

FEATURES

   Despite our Moses bias, we have been able to overcome many of Moses
   shortcomings in order to fit the needs of the translation industry: our
   solutions go beyond text-based MT and are capable of taking input and
   producing output in industry-standards, such as TMX and XLIFF. PangeaMT
   provides API access to other translation platforms so you do not need
   to change your translation environment but you can benefit from adding
   your future translations in a virtuous re-training cycle. Using open
   standards means that you will never have to buy expensive TM software
   again. Our solutions just avoid having you locked-in by expensive
   upgrades year after year. Another PangeaMT breakthrough is our inline
   mark-up parser. PangeaMT handles tags extremely
   efficiently. Statistical machine translation systems (as they come from
   open sources releases) usually produce plain text output because this
   is also the format they process. However, we are keen to see PangeaMT
   solutions in use and adapted to the most demanding language industry
   requirements. We focused our effort on developing SMT engines capable
   of handling in-line coding typical of other content formats used in
   localization production environments. Thanks to this parser, PangeaMT
   can identify in-lines without attempting to translate them, and it
   places them back in the resulting text, too. An in-line placeholder
   acts first by copying and transferring all XML and code information to
   a separate module. The translation engine does its work and then places
   the in-line back into the translated segment. At the time of its
   release, our in-line parser constituted an innovation well-above the
   current level of maturity of well-known SMT systems. We keep learning
   and improving with every development commissioned by an existing or new
   client and language combination. We therefore remain open as to apply
   new hybridization techniques, even ad-hoc rules, that we research and
   implement ourselves or co-develop in conjunction with our clients. We
   are aware of the fact that for some language combinations it will be
   necessary to resort to some linguistic-informative techniques that will
   be part of the pre- or post-processing phases. Right word and phrase
   reordering in the MT output is not an easy goal to achieve, especially
   when the languages involved are not closely linked from a linguistic
   family standpoint, or when one of the two languages is a really
   flexible and so MT-challenging word order (WO). Some language-specific
   fixing procedures may come handy. In some other cases, it may be useful
   to use one language as pivot to train engines in languages that are not
   close. These and other techniques may be used or taken as a basis for
   expanding our PangeaMT solution palette. Please visit our machine
   translation division website to learn more about PangeaMT.

Neural Machine Translation

   It is a general agreement that Neural Machine Translation (NMT) has
   surpassed Statistical Machine Translation (SMT) in terms of fluency and
   adequacy when humans read the texts produced by the software. NMT uses
   a large artificial neural network that resembles what happens in the
   human brain with thousands of connections. One of the main advantages
   of NMT is that the context of the translation is much longer than SMT
   (phrase-level translation). Currently, developers mostly use
   sequence-to-sequence approaches where the full context of the sentence
   is taken into account. Accuracy and fluency of the translations
   increase with the use of NMT. Other advantages of NMT in respect to SMT
   are that NMT only requires a fraction of the memory needed by SMT and
   all parts of the NMT models are trained jointly (end-to-end approach)
   in order to maximize the target translation performance. Pangeanic is
   at the forefront of research and development of translation
   technologies incorporating NMT, embedding it in different processes.


      ActivaTM: Matrix, Scalable, Infinite Language Database
      Translation Memory Technology
      Translate easy – Machine Translation API
      Machine Translation Technology
      Cor: Integrate Translation Technology in Your Processes


</file>
<file1-28.txt>
   alternate alternate alternate LogiTerm  RSS Feed


Statistical and neural network automatic translation software

   PDF version

   Portage s translators boost productivity and improve the quality of
   their work by generating automatic translations that draw on their own
   documents.

   Using statistical machine learning technology, Portage creates
   ever-more accurate translations the more it is used. Because Portage
   uses your archives rather than external resources, the translations it
   generates are considerably more accurate than with other automatic
   translation systems. For each sentence translated by Portage, a
   confidence index is produced. This allows users to filter translated
   output based on quality. To obtain sufficient quality, we recommend
   training Portage on a corpus of at least 5 million words.
   Systems Comparison

   Many clients report having translated entire documents with an accuracy
   rate of 70-80% in certain subject areas. This kind of accuracy means
   that a language professional can realistically expect to perform
   revision, rather than translation, when equipped with Portage.

   Portage is integrated into LogiTerm, Terminotix’s computer-aided
   translation software.

   You can use LogiTerm’s pretranslation settings to activate Portage when
   working with LogiTerm’s pretranslation engine. If a match is not found
   for a given text segment, LogiTerm will display an automatic
   translation for the segment (if a specific correspondence threshold is
   met).

   Portage supports TMX and SDLXLIFF file formats and retains output
   formatting codes. It also features a SOAP interface, which allows for
   integration with any computer-aided translation software or any other
   platform. Portage software can be installed on your servers or hosted
   on Terminotix’s servers.


</file>
<file1-29.txt>
   Cornell University
   We gratefully acknowledge support from
   the Simons Foundation and member institutions.

Title:Generative Neural Machine Translation

   Authors:Harshil Shah, David Barber
   (ted on 13 Jun 2018)

     Abstract: We introduce Generative Neural Machine Translation (GNMT),
     a latent variable architecture which is designed to model the
     semantics of the source and target sentences. We modify an
     encoder-decoder translation model by adding a latent variable as a
     language agnostic representation which is encouraged to learn the
     meaning of the sentence. GNMT achieves competitive BLEU scores on
     pure translation tasks, and is superior when there are missing words
     in the source sentence. We augment the model to facilitate
     multilingual translation and semi-supervised learning without adding
     parameters. This framework significantly reduces overfitting when
     there is limited paired data available, and is effective for
     translating between pairs of languages not seen during training.

   Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG);
   Machine Learning (stat.ML)
   Cite as: arXiv:1806.05138 [cs.CL]
     (or arXiv:1806.05138v1 [cs.CL] for this version)

Submission history

   From: Harshil Shah [view email]
   [v1] Wed, 13 Jun 2018 16:35:32 UTC (101 KB)
   Which authors of this paper are endorsers? | Disable MathJax (What is
   MathJax?)

   Link back to: arXiv, form interface, contact.  Browse v0.1 released
   2018-10-22    Feedback?

   If you have a disability and are having trouble accessing information
   on this website or need materials in an alternate format, contact
   web-accessibility@cornell.edu for assistance.
   
</file>
<file1-3.txt>
 

</file>
<file1-30.txt>
 

Is Neural Machine Translation Ready for Marketing Content?

   Terena Bell

   By Terena Bell | Sep 28, 2018

   CHANNEL: Digital Marketing
   An electronic brain is learning from a book PHOTO: Shutterstock

   When Google Translate first hit the market, it wasn’t very good. Music
   fans were the first to prove this by making a laughingstock of the app
   by loading lyrics from songs like Will Smith’s “Fresh Prince of
   Bel-Air” and the theme song from Moana to see what funny or ridiculous
   translations Google would generate. While the tool isn’t nearly as bad
   as videos make it out to be, this negative PR has kept companies from
   using it. After all, if Google can’t translate song lyrics correctly,
   why would you trust it with marketing content?

   But Google Translate doesn’t represent all machine translation.
   However, it is a brand that happens to be well-known and free. And, in
   translation — just like everything else — you get what you pay for.

   Related Article: Tips for Building a Content Translation Strategy

Machine Translation Post-Editing, MT+PE

   Professional (not free) machine translation s companies translate
   more content more quickly and at a lower price point. And when a human
   reviewer checks the work, quality is found to be just as high as that
   of traditional translation. The language industry calls this pairing:
   machine translation post-editing, or MT+PE for short. Traditionally,
   when you buy translation, your content goes through two rounds — the
   first person converts it into the new language then the second checks
   for error. With MT+PE, the computer takes the first pass, speeding up
   the entire process. This is what makes machine translation okay for
   certain marketing material, but not anything too highly-nuanced, Rick
   Antezana with the Association of Language Companies cautions. “Paid
   machine translation should be used when translating content of high
   volume and low risk/low importance,” he says.

   It all depends on the training. Words are stored in what computer
   programmers call an engine. Machine translation engines must be
   specifically trained for the old and new languages, as well as the
   subject matter you need. As with all artificial intelligence or machine
   learning, machine translation needs data to improve. If your company’s
   tech team uses the engine to translate user questions, it’ll be good at
   support language but not marketing. And just because an engine’s
   trained in the language you need doesn’t mean it can translate in the
   right direction: Spanish into English has traditionally required a
   different engine than English into Spanish.

Enter Neural Machine Translation

   But that’s changing. Laura Brandon, former director of the
   Globalization and Localization Association, says, “The big development
   these days is neural machine translation, which is blowing other
   machine translation out of the water.” Using neural network technology
   — a type of machine learning designed to mimic neurons in the human
   brain, neural machine translation can train in multiple languages and
   directions at once. Separate engines are no longer needed, saving your
   company precious training time.

   So is neural network technology better at translating marketing
   content? Not necessarily. “When representing any kind of brand
   — whether it be an enterprise-level, global company or a small software
   company — using machine translation to represent any kind of
   content, including marketing content like social is a big gamble, as
   the software is incredibly well developed, but never perfect,” Antezana
   says. “The fewer eyeballs on potential content for translation and the
   higher the volume, the more appropriate it would be.”
   Comments 0 Comments
      Top Articles
      Related Articles
      Recent Comments

      24 Headless CMS That Should Be On Your Radar in 2019
      10 Trends That Will Shape the Digital Workplace in 2019
      How Financial Services Compete on Customer Experience
      Digital Workplace Challenges for 2019
      Why Did IBM Sell Lotus and Other Software Products to HCL?

      3 Modern Marketing Ethics Traps to Avoid
      The Marketing Technologist: A Superhero and Agent of Change
      How Leaders Turn Martech Strategies into Tactics
      Establishing AI Ethics in Marketing
      Google Introduces 4 New Search Ad Position Metrics

   Useful Content?
   Email This Email This Stumble This Stumble This
   Follow @CMSWire

   Tags cxm, digital marketing, dxm, machine learning, machine
   translation, marketing, terena bell

   Please enable JavaScript to view the comments powered by Disqus.
   Resources
      Digital Workplace Conference
      Customer Experience Conference
      How to Select a Web CMS
      Sitecore Consultant
      What is the Digital Workplace?

   View All Events Add Your Event Events RSS
   Featured Events
      Jan 13 NRF New York City 2019
      Jan 15 Customer Contact Week Nashville 2019
      Jan 16 [CMSWire Webinar] How IT Leaders and Teams Can Win Big in
       2019
      Jan 21 OPEX Week: Business Transformation World Summit Orlando 2019
      Jan 22 [CMSWire Webinar] Modern Intranets & Employee Experience:
       Trends, Challenges & Solutions
      Jan 23 Carnegie Dartlet Conference Orlando 2019
      Jan 30 Savage.Z Berlin 2019

   About Us

   SMG/CMSWire is a leading, native digital publication produced by
   Simpler Media Group, Inc. We provide articles, research and events for
   sophisticated professionals driving digital customer experience
   strategy, evolving the digital workplace and creating intelligent
   information management practices. The CMSWire team produces 400+
   authoritative articles per quarter for our 2.7 million community
   members. Join us as a subscriber.

   Read more about us or learn how to advertise here. We also have a
   Reader Advisory Board.
   More:

   Monthly Editorial Calendar
   Article Submission Guidelines
   DW Experience Conference
   DX Summit Conference
   Advertiser Media Kit
   Press Releases
   Stay In the Loop
     
     
   Get Our 
   ____________________ [BUTTON Input] (not implemented)_________
   Most Popular Articles
      24 Headless CMS That Should Be On Your Radar in 2019 view comments
      10 Trends That Will Shape the Digital Workplace in 2019 view
       comments
      How Financial Services Compete on Customer Experience view comments
      Digital Workplace Challenges for 2019 view comments
      Why Did IBM Sell Lotus and Other Software Products to HCL? view
       comments

   Recent Comments

    2019 Simpler Media Group, Inc. All rights reserved.
   Privacy Policy. Terms of Use.
   Powered by Sitecore and Coveo. SMGP v2.1.6928.18046.
</file>
<file1-31.txt>
   alternate

   Warning: The NCBI web site requires JavaScript to function. more...
      NCBI NCBI Logo
      Skip to main content
      Skip to navigation
      Resources
      How To
      About NCBI Accesskeys

   My NCBISign in to NCBISign Out

PMC

   US National Library of Medicine
   National Institutes of Health
   Search database[PMC.....................]
   Search term
   ____________________
    Search
      Advanced
      Journal list
      

      Journal List
      JMIR Public Health Surveill
      v.1(2); Jul-Dec 2015
      PMC4869219

   Logo of jmirphs
   JMIR Public Health Surveill. 2015 Jul-Dec; 1(2): e17.
   Published online 2015 Nov 17. doi: 10.2196/publichealth.4779
   PMCID: PMC4869219
   PMID: 27227135

Machine Translation of Public Health Materials From English to Chinese: A
Feasibility Study

   Monitoring Editor: Gunther Eysenbach
   Reviewed by Daniel Capurro, Yoonsang Kim, and Barabara Massoudi
   Anne M Turner, MD, MLIS, MPH,^ corresponding author ^^1 Kristin N Dew,
   MS,^^2 Loma Desai, MS, MBA,^3 Nathalie Martin, BA,^^4 and Katrin
   Kirchhoff, PhD^^5
   ^1Northwest Center for Public Health Practice, Department of Health
   Services, University of Washington, Seattle, WA, United States
   ^2Northwest Center for Public Health Practice, Human Centered Design &
   Engineering, University of Washington, Seattle, WA, United States
   ^3Northwest Center for Public Health Practice, Information School,
   University of Washington, Seattle, WA, United States
   ^4Northwest Center for Public Health Practice, Department of Biomedical
   Informatics and Medical Education, University of Washington, Seattle,
   WA, United States
   ^5Speech, Signal and Language Interpretation (SSLI) Lab, Department of
   Electrical Engineering, University of Washington, Seattle, WA, United
   States
   Anne M Turner, Northwest Center for Public Health Practice, Department
   of Health Services, University of Washington, Suite 400, 1107 NE 45th
   Street, Seattle, WA, 98105, United States, Phone: 1 206 491 1489, Fax:
   1 206 616 5249, Email: ude.wu@renrutma.

Anne M Turner

   ^1Northwest Center for Public Health Practice, Department of Health
   Services, University of Washington, Seattle, WA, United States
   Find articles by Anne M Turner

Kristin N Dew

   ^2Northwest Center for Public Health Practice, Human Centered Design &
   Engineering, University of Washington, Seattle, WA, United States
   Find articles by Kristin N Dew

Loma Desai

   ^3Northwest Center for Public Health Practice, Information School,
   University of Washington, Seattle, WA, United States
   Find articles by Loma Desai

Nathalie Martin

   ^4Northwest Center for Public Health Practice, Department of Biomedical
   Informatics and Medical Education, University of Washington, Seattle,
   WA, United States
   Find articles by Nathalie Martin

Katrin Kirchhoff

   ^5Speech, Signal and Language Interpretation (SSLI) Lab, Department of
   Electrical Engineering, University of Washington, Seattle, WA, United
   States
   Find articles by Katrin Kirchhoff
   Author information Article notes Copyright and License information
   Disclaimer
   ^corresponding author Corresponding author.
   ^Contributed equally.
   Corresponding Author: Anne M Turner ude.wu@renrutma
   Received 2015 May 29; Revisions requested 2015 Jul 8; Revised 2015 Aug
   18; Accepted 2015 Oct 7.
   Copyright [copyright]Anne M Turner, Kristin N Dew, Loma Desai, Nathalie
   Martin, Katrin Kirchhoff. Originally published in JMIR Public Health
   and Surveillance (http://publichealth.jmir.org), 17.11.2015.
   This is an open-access article distributed under the terms of the
   Creative Commons Attribution License
   (http://creativecommons.org/licenses/by/2.0/), which permits
   unrestricted use, distribution, and reproduction in any medium,
   provided the original work, first published in JMIR Public Health and
   Surveillance, is properly cited. The complete bibliographic
   information, a link to the original publication on
   http://publichealth.jmir.org, as well as this copyright and license
   information must be included.
   This article has been cited by other articles in PMC.

Abstract

Background

   Chinese is the second most common language spoken by limited English
   proficiency individuals in the United States, yet there are few public
   health materials available in Chinese. Previous studies have indicated
   that use of machine translation plus postediting by bilingual
   translators generated quality translations in a lower time and at a
   lower cost than human translations.

Objective

   The purpose of this study was to investigate the feasibility of using
   machine translation (MT) tools (eg, Google Translate) followed by human
   postediting (PE) to produce quality Chinese translations of public
   health materials.

Methods

   From state and national public health websites, we collected 60 health
   promotion documents that had been translated from English to Chinese
   through human translation. The English version of the documents were
   then translated to Chinese using Google Translate. The MTs were
   analyzed for translation errors. A subset of the MT documents was
   postedited by native Chinese speakers with health backgrounds.
   Postediting time was measured. Postedited versions were then blindly
   compared against human translations by bilingual native Chinese quality
   raters.

Results

   The most common machine translation errors were errors of word sense
   (40%) and word order (22%). Posteditors corrected the MTs at a rate of
   approximately 41 characters per minute. Raters, blinded to the source
   of translation, consistently selected the human translation over the
   MT+PE. Initial investigation to determine the reasons for the lower
   quality of MT+PE indicate that poor MT quality, lack of posteditor
   expertise, and insufficient posteditor instructions can be barriers to
   producing quality Chinese translations.

Conclusions

   Our results revealed problems with using MT tools plus human
   postediting for translating public health materials from English to
   Chinese. Additional work is needed to improve MT and to carefully
   design postediting processes before the MT+PE approach can be used
   routinely in public health practice for a variety of language pairs.
   Keywords: public health informatics, public health, natural language
   processing, machine translation, Chinese language, health promotion,
   public health departments, consumer health, limited English
   proficiency, health literacy

Introduction

   A key role of public health departments is to inform and educate the
   public on issues of public health importance. Health departments
   produce health promotion materials on a range of topics, such as
   environmental health, communicable diseases, immunizations, and
   maternal-child health, and the Internet has become a key mechanism by
   which they distribute and disseminate this information. Although
   federal and state regulations require that health materials be made
   available in the languages of patients, due to the time and costs
   required to manually produce quality translations, very few of these
   materials are available in languages other than English [1]. Therefore,
   individuals with limited English proficiency (LEP) have limited access
   to this health information. This is of particular significance given
   that LEP status is associated with poor health literacy and negative
   health consequences, including documented health disparities such as
   poorer health outcomes and poorer access to health care and preventive
   services compared to English-speaking minorities [2-4].

   Machine translation (MT) ---the automatic translation of text from one
   human language into another by a computer program ---has been an area
   of study within natural language processing for several decades.
   State-of-the-art MT tools use a statistical machine translation (SMT)
   framework. This approach uses large amounts of parallel text for the
   desired language pair to train SMT models. During testing, an SMT
   engine then produces the most likely translation under the statistical
   model. While MT tools have improved greatly over the last 5 years, and
   MT is now routinely used by many language service providers, the
   quality of raw MT output generally falls short of human-generated
   translations (HT).

   In order to produce quality translations, MT errors need to be
   corrected by human readers who have domain expertise and are fluent in
   the source and target languages. This correction, called postediting
   (PE), can range from light to heavy editing. It has been shown that
   MT+PE increases productivity (ie, it can be completed more quickly than
   producing an entirely new HT) both for translators and for lay users
   [5]. However, compared with translating, postediting is a cognitively
   different process, and postediting results are strongly dependent on
   posteditor skill, attitudes towards machine translation, difficulty of
   the source document, and quality of the initial machine translation
   output [5,6].

   Our previous research indicates that freely available MT tools, such as
   Google Translate and Microsoft Translator, can be used in conjunction
   with human PE to produce quality translations efficiently and at low
   cost [7,8]. We compared the time and cost of HT versus MT+PE for
   Spanish public health documents, using health professionals as
   posteditors [7]. Posteditors corrected 25 machine-translated public
   health documents. Pairs of HT and MT+PE were blindly presented to 2
   bilingual public health professionals, who were asked to rate which of
   the translations they preferred. In this blinded rating, the HT and
   MT+PE were found to be overall equivalent (33% HT preferred, 33% MT+PE
   preferred, 33% both translations considered equivalent).

   These previous studies were conducted on a single language pair of
   English-Spanish. SMT generally works best when the source and target
   languages have similar sentence structures, as in the case of
   English-Spanish. In order to assess the broader usefulness of MT
   technology in public health departments, it is necessary to determine
   whether these results generalize to a wider set of language pairs,
   specifically those pairs with very divergent linguistic structures. One
   such pair, English-Chinese, is of particular interest since Chinese is
   the second most common language spoken by LEP individuals in the United
   States, representing 6.1% of the LEP population [9].

   We conducted postediting experiments, similar to those conducted for
   the English-Spanish pair, in order to determine the feasibility
   (accuracy and efficiency) of using MT+PE for translating public health
   documents from English to Traditional Chinese. We investigated the
   types of MT errors occurring in Chinese, the PE time needed to correct
   them, and the quality of MT+PE compared to HTs, as rated by raters
   fluent in both English and Traditional Chinese. In this paper, we
   discuss the results of these investigations and compare them to our
   previous experiences with the English-Spanish pair. This work
   contributes to our understanding of the challenges involved in applying
   the MT+PE approach in a public health setting.

Methods

Initial Steps

   We collected 60 health promotion documents from different public health
   agencies in the United States that had been translated manually (HT)
   from English to Chinese. Translations were created using the
   Traditional Chinese character set, as opposed to Simplified Chinese,
   because this is the form known to most Chinese LEP individuals in the
   Pacific Northwest region. We identified the types of linguistic errors
   present in MT from English to Chinese and then conducted the
   postediting of the translated materials with participants fluent in
   both languages. Next, we had bilingual public health professionals and
   laypersons rate the quality of the human versus the MT plus postedited
   documents. A diagram of the study design is shown in Figure 1. A more
   detailed description of the specific methods for the linguistic error
   analysis, postediting and rating studies, and follow-up evaluation is
   provided below.
   An external file that holds a picture, illustration, etc. Object name
   is publichealth_v1i2e17_fig1.jpg
   Open in a separate window
   Figure 1

   Study Design Overview.

Linguistic Error Analysis

   We collected 60 health promotion documents available in English and
   Chinese (Traditional) from public health websites in the United States.
   Websites included those of the Centers for Disease Control and
   Prevention, New York City Department of Health and Public Health,
   Minnesota Department of Health, Washington State Department of Health,
   Department of Public Health - Los Angeles County, and Public Health -
   Seattle & King County. All Chinese versions of these documents had been
   translated manually (HT) by health department translators or
   professional translation vendors. The English versions of the documents
   were then translated into Traditional Chinese using Google Translate.
   We developed a categorization scheme for MT errors, and all MTs were
   annotated based on this scheme by a native Chinese speaker with formal
   training in linguistics. Subsequently, aggregate error statistics were
   computed to gain insights into the most frequent error categories: word
   sense, word order, missing word, superfluous word,
   orthography/punctuation, particle error, untranslated word, pragmatic
   error, and other grammar error.

Postediting Experiments

   For the postediting studies, we selected 25 of the 60 health documents
   that had been machine translated from English to Chinese using Google
   Translate. To ensure a wide representation of topics, we selected the
   documents based on the length of the English version (340-914 words)
   and topic area. From the memberships of local Chinese cultural
   organizations, 6 Chinese translators were recruited for postediting and
   screened for language ability and health experience. Posteditors, all
   native Chinese speakers, were fluent in oral and written Traditional
   Chinese and English, had varying levels of translation experience, and
   had prior experience in a health-related field (Table 1).

Table 1

   Initial postediting and quality rating participants, health, and
   translation experience.
   Participant number Role Health background Translation experience
   P1 Posteditor Pharmacy student Limited ---translating at health fairs
   P2 Posteditor Social work for Chinese population, including health care
   support Teaching English as a second language & translating research
   P3 Posteditor and quality rater Public health researcher 10 years of
   various translation experience
   P4 Posteditor Social work for Chinese population, including health care
   support Translating agency and government publications for distribution
   to clients
   P5 Posteditor Public health student None
   P6 Quality rater (posteditor for follow-up evaluation only) Public
   health translator DSHS Certified Medical Interpreter
   Open in a separate window

   The 25 machine-translated documents were each corrected by at least 2
   posteditors in order to permit consistency checks across posteditors
   and computation of average time, adequacy, and fluency ratings per
   document. Posteditors used a proprietary MT and postediting tool built
   for the purpose of this study, as described previously [7]. Each
   posteditor corrected between four and 21 documents representing common
   types of public health materials, including informational webpages,
   agency letters, fact sheets, and brochures. Posteditors were allowed to
   choose their preferred character input method. One posteditor used a
   pinyin keyboard called Q9, while the rest used the standard Windows OS
   pinyin input. The postediting tool displays three versions of the text
   from left to right in one window: the original English text, the MT,
   and the editable MT, respectively. When a posteditor clicks the
   editable MT field to begin editing, a timer starts. The tool saves the
   total editing time (minus pauses), keystrokes, and a copy of the
   postedited machine translation. Time and keystroke data were collected
   for all postedited documents. Due to a posteditor saving error, only 24
   of the 25 postedited documents were put out in a readable format and
   therefore available for rating.

   Posteditors were given written and verbal instructions to "perform all
   corrections necessary to ensure that the text (1) is consistent with
   the grammar rules of Chinese, (2) adequately represents the meaning of
   the English text, (3) is culturally appropriate (ie, not
   unintentionally funny or offensive), and (4) preserves the linguistic
   style of the source document." Posteditors were asked not to alter a
   correct, appropriate translation simply because it may not correspond
   to their first choice of translation. In short, they were instructed to
   correct only as much as necessary and to not rewrite the text. These
   were the same instructions used in the previous Spanish study.

   After completing postediting, participants were asked to fill out a
   questionnaire to rate the adequacy and fluency of each MT+PE on a scale
   of 1-5. These rating scales are common in human evaluations of machine
   translation quality [10]. An adequacy of 1 indicated that none of the
   original meaning of the English source text was retained in the MT,
   while an adequacy of 5 indicated that all of the meaning was retained.
   A fluency rating of 1 indicated that the MT was incomprehensible, while
   a rating of 5 indicated flawless Chinese. The questionnaire also asked
   participants to describe the common translation errors they found,
   identify which errors were most difficult to correct, and explain which
   errors took the longest time to correct.

Quality Rating

   Two public health professionals, blinded to the method of translation,
   compared the quality of the postedited documents to the quality of the
   HT documents from the health department websites. The quality raters
   were asked to rate the MT+PE against HT versions. One rater was a
   professional public health translator and a Department of Social and
   Health Services Certified Medical Interpreter at a local clinic; the
   other was a health researcher (Table 1). They were presented with 20
   sets of documents selected from the 24 available, with each set
   containing an original English text, an HT version of that text, and an
   MT+PE version of the text. Even though one rater participated in the
   initial postediting study as well, she did not rate documents that she
   had encountered while postediting. The documents were not labeled as
   human- or machine-translated, and the order in which they were
   presented in each set was randomized. Using a questionnaire, we asked
   the quality raters to read each set carefully, indicate which of the
   translated versions they preferred, and describe why they chose that
   version, based on five dimensions: grammar, adequacy, word choice,
   cultural appropriateness, and reading level.

Follow-Up Evaluation

   After analyzing the results of the quality rating study, we performed
   follow-up evaluations of the effects of posteditor expertise,
   engagement, and instructions on the quality of postedited translations.
   To assess whether posteditors' public health and translation expertise
   negatively impacted the quality rating outcome, we asked P6, a highly
   trained and experienced health translator, to postedit four documents.
   We then repeated the quality rating procedure with those documents,
   asking five native Chinese speakers to review them.

   To test posteditor engagement and whether the instructions to edit only
   as necessary were problematic, we asked 3 posteditors (P2, P4, and P5)
   to return to edit a total of 10 more documents, this time with
   instructions to make as many corrections as needed to ensure the
   quality of the translation. We again repeated the quality rating
   procedure with one native Chinese speaker who has public health
   experience to see whether posteditors given the revised instructions
   would produce text equivalent to the HTs.

Results

Linguistic Error Analysis

   Results from the linguistic error analysis are summarized in Table 2.
   The left-hand column shows the error type; the right-hand column shows
   the corresponding frequency of the error type, computed as the
   percentage of all errors annotated in the total set of 60 documents.
   For example, word sense errors (errors where the word meaning was
   translated incorrectly) constituted 40% of all annotated errors. The
   next most common error types involved word order (22%) and missing
   words (16%).

Table 2

   Error categories and their distributions.
   Error categories        Frequency (%)
   Word sense              40
   Word order              22
   Missing word            16
   Superfluous word        14
   Other grammar error     3
   Orthography/punctuation 3
   Particle error          1
   Untranslated word       0.03
   Pragmatic error         0.01
   Open in a separate window

Postediting Experiments

   The proprietary postediting tool recorded the time taken to postedit
   each machine-translated document. We analyzed the time taken, by
   document and by posteditor, and examined posteditors' quality ratings
   of the initial MT output. A list with descriptions of the source
   documents is provided in Multimedia Appendix 1.

   To determine and analyze the amount of time required for postediting,
   we calculated the number of characters per minute (CPM) for each
   document and then computed means and standard deviations (SDs) in CPM
   for each document, using posteditors' recorded times. In addition, we
   computed means and SDs in CPM for each posteditor (Table 3). This
   ed us gain insights into potential correlations between postediting
   time and document topic, length, etc, as well as differences between
   posteditors (though not all posteditors edited the same number of
   documents).

Table 3

   Postediting time, adequacy, and fluency ratings by posteditor.
   Posteditor Docs postedited, n CPM, mean (SD) Avg. adequacy Avg. fluency
   P1         9                  34.2 (7.3)     4             3.2
   P2         21                 35.4 (16.2)    N/A           N/A
   P3         4                  25.8 (10.2)    3             2.5
   P4         4                  54.3 (40.5)    3.25          3.25
   P5         11                 54.0 (16.0)    3.875         3.75
   P6         4                  20.6 (3.7)     1.75          1.625
   Open in a separate window

   The mean CPM per document varied greatly, from 18.5-79.6 CPM (SD
   0.03-38.7). The total mean CPM across all documents was 37.8 (SD 10.2).
   Thus, on average a posteditor corrected approximately 38 CPM, with a
   variation of around 10 CPMs. The results did not indicate a linear
   relationship between document length and average postediting time. We
   also found no relationship between the document type and the average
   CPM.

   On average, the posteditors rated the adequacy of the translations at
   3.32 (SD 0.90), suggesting that much of the original meaning of the
   source text was preserved in the MT. Average fluency rating was 3.0 (SD
   0.84), which corresponds to a grammar quality level of non-native
   Chinese. The average adequacy and fluency ratings bore no relationship
   to the document type or length, but varied greatly by individual
   posteditor. Interestingly, the posteditors who had more experience with
   translation and health rated the adequacy and fluency lower than did
   their less experienced counterparts (Table 3).

   To investigate the variation in postediting speed for individuals, we
   calculated the average CPM for each posteditor. As shown in Table 3,
   the average CPM was 37.4 and the average SD for CPM per document was
   15.7. We also found large individual differences in speed among
   posteditors [11,12]. Posteditors also varied widely in their adequacy
   and fluency ratings, with a trend indicating an inverse relationship
   between public health translation experience and ratings; the more
   experienced posteditors in terms of translation and public health
   expertise tended to rate the documents they postedited lower than those
   with less experience (Tables 1 and and33).

   Errors described by posteditors as difficult to correct, or annoying,
   included word sense errors and word order errors. Some examples of the
   errors noted by posteditors are provided in Table 4.

Table 4

   Posteditor examples of top three error categories.
   Error category Quotes/examples
   Word sense "The literal meaning changes when translated into Chinese
   (eg, lost power/electricity is translated as lost 'energy')"
   Word order "'...when...can't...' type of sentence doesn't have same
   structure in Chinese. The order of the words change in Chinese and
   English in many situations"
   Missing word "Whenever there is the word 'person' we should mention
   'this' or 'that' person, otherwise it is not clear who are we talking
   about in the sentence."
   Open in a separate window

Quality Rating

   Unlike our previous experience with English to Spanish translations, in
   a blind comparison of HT and MT+PE, the quality raters selected the HT
   document as the preferred version for all 20 documents. Reasons given
   for the preference were better word order, a more professional reading
   level, smoother flow, more accurate translated word use, preserved
   meaning, and cultural appropriateness of the original English document.
   The reasons the rater gave for rejecting the MT+PE documents were that
   they did not meet the reading level of the general public, some of the
   sentences lost the intended meaning, the same words were not translated
   consistently, awkward word order, and occasionally wrong word
   translations and awkward word flow.

Follow-Up Evaluation

   In theory, if posteditors have sufficient training, experience, and
   resources to perform quality postediting, MT+PE documents should be
   equivalent to HT documents. The feasibility of utilizing MT+PE has been
   repeatedly demonstrated in various previous studies for a variety of
   language pairs; it is also a procedure that is widely used by many
   commercial language service providers. In previous work with the
   Spanish-English language pair, we found our approach feasible even
   among lay users with minimal training; these conditions closely mirror
   the public health context, where resources for training and calibration
   are limited.

   There are several potential reasons for the preference for the HT over
   the MT+PE in this study:

Differences in MT Quality

   Chinese machine translations have a different relative frequency of
   certain error types and lower quality overall. Compared to our previous
   studies on English-Spanish [8,13], we found that the Chinese
   translations had high percentages of word order and word sense errors,
   which require more cognitive effort to correct [14-16]. Adequacy and
   fluency also had lower ratings compared to the Spanish translations:
   adequacy for Chinese was 3.3 compared to 4.2 for Spanish; fluency was
   3.1 versus 3.7 for Spanish. It should be noted that these scores are
   not directly comparable since the the sets of English documents used in
   these two studies were not identical; however, the differences in
   scores confirm the common observation in the MT community that MT for
   English-Chinese is less effective than for English-Spanish.

Instructions Provided to Posteditors

   Posteditors might have misinterpreted the postediting instructions.
   Specifically, the instruction to "postedit only where necessary" and to
   not "rewrite" might have led them to produce fewer edits than they
   would under real-life circumstances. Quality raters observed that the
   postedited documents often contained very literal word-by-word
   translations that were perceived as unacceptable. In other language
   pairs with similar linguistic structures (like English and Spanish),
   more literal translations may still yield acceptable translation
   outputs, whereas fluent Chinese requires the translator to depart more
   strongly from a literal translation. Due to time and resource
   constraints for this study, as with prior studies, there also was no
   extensive training and calibration phase for the study participants.
   Combined with the lower quality of initial MT Chinese versions, the
   postediting instructions might  explain the lesser quality of the
   postedited Chinese translations as compared to the Spanish
   translations.

Linguistic Expertise of Posteditors

   Although posteditors were selected for bilingual competence and
   familiarity with the domain of public health, they did not have to
   undergo initial language or translation tests to verify their editing
   abilities.

Engagement of Posteditors

   Posteditors may not have been sufficiently engaged in the task, or they
   may have optimized for time rather than quality.

Different Levels of Quality Control

   In the postediting, only one round of postediting was performed,
   followed by the quality rating task. We do not know how many iterations
   of editing and quality control were applied to the human-generated
   translations, since they were collected from different sources where
   the translation processes were not transparent. Our prior
   investigations into health department translation processes revealed
   that most of the public health HT documents had been translated
   in-house or by language service providers who conduct several rounds of
   postediting and review prior to making them public [7].

Additional Follow-Up

   In order to ascertain the contribution of these factors to the overall
   results, we conducted additional follow-up studies investigating the
   role of posteditor expertise, instruction, and engagement.

Expertise

   To assess whether posteditor expertise played a role in the translation
   quality, we engaged the services of a public health professional who
   performed translation for a large metropolitan health department in
   Washington State (P6). She was given the original set of instructions
   to correct only as much as needed and to not rewrite the text
   extensively. She postedited four documents, which were then given as a
   set and blindly rated against their original human translations by five
   native Chinese speakers so that each rater reviewed all four documents.
   Three of the 5 raters selected the human translation over the MT+PE for
   all four documents; 2 raters rated one of the HT and MT+PE documents as
   equivalent.

Instructions and Engagement

   To test whether our instructions to postedit only where necessary
   played a role in the MT+PE ratings, we modified the instructions to
   emphasize quality and recruited 3 posteditors to come back for another
   postediting session with the new instructions. The original
   instructions ---as adapted from the Spanish study ---directed
   posteditors to not alter a correct translation, even if it was not
   their first choice; to not engage in extensive rewriting of the text;
   and to not spend an extended period of time looking up grammar,
   punctuation, or unfamiliar terminology online. The updated instructions
   directed posteditors to use as much time and effort as necessary to
   ensure a high-quality translation. The 3 returning posteditors
   corrected a total of 10 documents, which were then blindly rated by a
   quality rater with language and public health expertise. As
   anticipated, posteditors took longer to produce the MT+PE translations
   with the updated instructions: P2's average speed dropped from 35.38
   CPM to 23.43 CPM, P4's fell from 54.33 to 17.46 CPM, and P5's decreased
   from 53.96 to 19.69 CPM. The rater chose the manual human translations
   for 6/10 documents, while rating four as equivalent ---a notable
   improvement over the original instructions.

Discussion

Principal Findings

   Although our prior research on English to Spanish translation indicated
   that MT+PE could produce translations equivalent in quality for less
   time and cost, our current study on the English-Chinese language pair
   showed that maintaining quality through postediting was more
   problematic. Translation between English and Chinese presents a
   challenge due to very divergent syntactic structures (eg, topic-comment
   structure in Chinese vs subject-verb-object structure in English),
   frequent dropping of pronouns in Chinese, higher degree of morphology
   in English, and other linguistic differences. Compared to a language
   pair like English and Spanish, SMT for English and Chinese generally
   tends to produce lower-quality results (eg, the results obtained in
   benchmark evaluations for different language pairs conducted by the US
   National Institute of Standards and Technology [17].

Strengths and Limitations

   Although, theoretically, professional translators with sufficient
   training and time should be able to produce an equivalent product
   through postediting MTs, even with instructions to take the time to
   provide the best quality translation, the final postedited translations
   still contained obvious errors that led the quality raters to prefer
   HTs in most cases. Experienced translators who performed the
   translations rated the adequacy and fluency of MT+PE lower in general
   than their less experienced counterparts and commented that for many
   machine-translated sentences it would be easier to start with the
   English version than to correct the MT version. However, it should be
   noted that our prior evaluation of health department translation
   processes found that HT documents undergo multiple editing cycles to
   ensure translation quality and cultural appropriateness. In the studies
   reported on here, the machine-translated documents underwent only one
   round of postediting. It is likely that with additional rounds of
   editing the MT+PE product would be further improved.

   Another possible limitation of our study is the use of a single
   translation engine, Google Translate. However, most SMT systems are
   based on the same set of underlying statistical models, suggesting that
   the types and relative frequencies of translation errors would not have
   been significantly different had a different SMT system been used.

   Additional work is needed to improve the quality of MT from English to
   Chinese. Word sense and word order errors require the most attention
   for improvement. Our team is currently working to improve these errors.
   In addition, particular care must be taken in selecting posteditors,
   documents, and machine translation engines, and in designing
   postediting instructions and quality control processes.

Conclusion

   In the United States, Chinese is the second most common language spoken
   by LEP individuals and the single most common character language used.
   However, due to the resources and time involved in human translation,
   health departments currently offer few health promotion materials in
   Chinese. Our investigation into the use of MT+PE to produce
   translations indicates that using the methods that worked for English
   to Spanish translations was not as effective with translation from
   English to Chinese. Multiple factors, including quality of MT and
   expertise of posteditors, may have contributed to these results. Our
   preliminary follow-up studies suggest that reducing word sense errors
   and word order errors would improve English to Chinese MTs, while
   additional training and expertise of bilingual posteditors may be
   needed in order to successfully apply online MT technology to public
   health practice. We are performing additional studies to determine how
   best to improve translation from English to Chinese in order to ensure
   quality translation at a low cost.

Acknowledgments

   The research reported here was supported by the National Library of
   Medicine of the National Institutes of Health (NIH) under award number
   R0110432704. The content is solely the responsibility of the authors
   and does not necessarily represent the official views of the NIH. The
   images used in Figure 1 were created by Hadi Davodpour, Edward Boatman,
   and Lauren Manninen for the Noun Project. We would also like to thank
   Beryl Schulman and Julie Loughran for reviewing this manuscript.

Abbreviations

   CPM characters per minute
   HT  human translation
   LEP limited English proficiency
   MT  machine translation
   NIH National Institutes of Health
   PE  postediting
   SMT statistical machine translation

Multimedia Appendix 1

   Study source documents and postediting times.
   Click here to view.^(41K, pdf)

Footnotes

   Conflicts of Interest: None declared.

References

   1. Turner A, Capurro D, Kirchhoff K. 3rd Annual Health Literacy
   Research Conference. Chicago, IL: Health Literacy Research Conference;
   2011. [2015-05-19]. The availability of translated public health
   materials for limited English proficiency populations in Washington
   State
   http://www.bumc.bu.edu/healthliteracyconference/files/2011/07/Poster-Ab
   stracts-Packet.pdf webcite.
   2. Raynor EM. Factors Affecting Care in Non-English-Speaking Patients
   and Families. Clin Pediatr (Phila) 2015 May 11; doi:
   10.1177/0009922815586052. [PubMed] [CrossRef]
   3. Ponce NA, Hays RD, Cunningham WE. Linguistic disparities in health
   care access and health status among older adults. J Gen Intern Med.
   2006 Jul;21(7):786-91. doi: 10.1111/j.1525-1497.2006.00491.x.
   http://europepmc.org/abstract/MED/16808783. [PMC free article] [PubMed]
   [CrossRef]
   4. Sentell TL, Tsoh JY, Davis T, Davis J, Braun KL. Low health literacy
   and cancer screening among Chinese Americans in California: a cross
   sectional analysis. BMJ Open. 2015;5:1-9. [PMC free article] [PubMed]
   5. Aranberri N, Labaka G, Diaz de Ilarraza A, Sarasola K. Comparison of
   Post-Editing Productivity between Professional Translators and Lay
   Users. Third Workshop on Post-editing Technology and Practice; Third
   Workshop on Post-editing Technology and Practice; October 2014;
   Vancouver (BC), Canada. 2014. Oct, pp. 20-33.
   http://www.amtaweb.org/AMTA2014Proceedings/AMTA2014Proceedings_PEWorksh
   op_final.pdf.
   6. Koehn P, Germann U. The impact of machine translation quality on
   human post-editing. Workshop on Humans and Computer-Assisted
   Translation; Workshop on Humans and Computer-assisted Translation;
   2014; Gothenburg, Sweden. 2014. pp. 38-46.
   http://www.aclweb.org/anthology/W14-0307.pdf.
   7. Turner AM, Bergman M, Brownstein M, Cole K, Kirchhoff K. A
   comparison of human and machine translation of health promotion
   materials for public health practice: time, costs, and quality. J
   Public Health Manag Pract. 2014;20(5):523-529. doi:
   10.1097/PHH.0b013e3182a95c87. [PMC free article] [PubMed] [CrossRef]
   8. Kirchhoff K, Turner AM, Axelrod A, Saavedra F. Application of
   statistical machine translation to public health information: a
   feasibility study. J Am Med Inform Assoc. 2011;18(4):473-478. doi:
   10.1136/amiajnl-2011-000176.
   http://jamia.oxfordjournals.org/cgi/pmidlookup?viewlong&pmid21498805.
   [PMC free article] [PubMed] [CrossRef]
   9. Pandya C, Batalova J, McHugh M. Limited English Proficient
   Individuals in the United States: Number, Share, Growth, Linguistic
   Diversity. Migration Policy Institute; 2011. [2015-05-17].
   http://www.immigrationresearch-info.org/report/migration-policy-institu
   te/limited-english-proficient-individuals-united-states-number-share-
   webcite.
   10. Linguistic Data Consortium Linguistic Data Annotation
   Specification: Assessment of Fluency and Adequacy in Translations
   Revision 1.5. 2005. Jan 25, [2015-05-18].
   https://www.ldc.upenn.edu/collaborations/past-projects webcite.
   11. Guerberof A. Machine Translation Summit XII. 2009. Aug,
   [2015-05-18]. Productivity and quality in MT post-editing
   http://www.mt-archive.info/MTS-2009-Guerberof.pdf webcite.
   12. Guerberof A. Correlations between productivity and quality when
   post-editing in a professional context. Machine Translation. 2014 Nov
   20;28(3-4):165-186. doi: 10.1007/s10590-014-9155-y. [CrossRef]
   13. Kirchhoff K, Capurro D, Turner AM. A Conjoint Analysis Framework
   for Evaluating User Preferences in Machine Translation. Mach Transl.
   2014 Mar 1;28(1):1-17. doi: 10.1007/s10590-013-9140-x.
   http://europepmc.org/abstract/MED/24683295. [PMC free article] [PubMed]
   [CrossRef]
   14. Temnikova I. Cognitive Evaluation Approach for a Controlled
   Language Post-Editing Experiment. Proceedings of the Seventh
   International Conference on Language Resources and Evaluation; Seventh
   International Conference on Language Resources and Evaluation; May
   2010; Valletta, Malta. 2010. May, pp. 3485-3490.
   http://www.lrec-conf.org/proceedings/lrec2010/pdf/437_Paper.pdf.
   15. Lacruz I, Denkowski M, Lavie A. Cognitive Demand and Cognitive
   Effort in Post-Editing. Third Workshop on Post-Editing Technology and
   Practice; The Third Workshop on Post-Editing Technology and Practice;
   October 2014; Vancouver (BC), Canada. 2014. Oct, pp. 73-84.
   http://www.amtaweb.org/AMTA2014Proceedings/AMTA2014Proceedings_PEWorksh
   op_final.pdf.
   16. Koponen M, Aziz W, Ramos L, Specia L. Post-Editing Time as a
   Measure of Cognitive Effort. Workshop on Post-Editing Technology and
   Practice; Tenth Biennial Conference of the Association for Machine
   Translation of the Americas; October 2012; San Diego, California. 2012.
   Oct, http://amta2012.amtaweb.org/AMTA2012Files/html/13/13_paper.pdf.
   17. Koehn P. Options, Human Language Technologies: The 2010 Annual
   Conference of the North American Chapter of the ACL. 2010 Annual
   Conference of the North American Chapter of the ACL; June 2010; Los
   Angeles, CA. 2010. Jun, pp. 537-545.
   http://www.aclweb.org/anthology/N10-1078.
     __________________________________________________________________

   Articles from JMIR Public Health and Surveillance are provided here
   courtesy of JMIR Publications Inc.

Formats:

      Article |
      PubReader |
      ePub (beta) |
      Printer Friendly |
      Citation

Share

      Share on  
      Share on  
      Share on Google Plus Google+

   Support Center Support Center
   External link. Please review our privacy policy.
   NLM
   NIH
   DHHS
   USA.gov

   National Center for Biotechnology Information, U.S. National Library of
   Medicine 8600 Rockville Pike, Bethesda MD, 20894 USA
   Policies and Guidelines | Contact

   statistics
</file>
<file1-32.txt>
 

We deliver high quality translations that bring your products and services to
new countries faster than ever before.

Highest Quality Translations

   Lilt’s AI works alongside translators, bringing a level of quality that
   was previously impossible.

Translations done by domain experts

   Our translators have ed some of the greatest companies in the world
   break into new markets.

Enterprise-level security

   Your data are always encrypted, and never shared with anyone.

See why top companies choose Lilt

   Zendesk

   “We didn’t want an off-the-shelf solution. We needed something we could
   customize as much as possible to our own vocabulary, and that could
   instantaneously learn as we went along with our human and machine
   translations.”
   [melissa-burch.19e3d49.jpg]

   Melissa Burch

   ZenDesk
   See Case Study

Knowledge should be universally accessible. Ours is.

   Browse Resources

Whitepaper

Machine Translation Evaluation

   With so many competing technologies, how can you be sure any one
   solution will set your business up for success?
   Read More

Webinar

Intro to Neural Machine Translation

   Neural Machine Translation is everywhere. What are the advantages of
   using it over existing technologies? Glad you asked.
   Read More

Become a Translator

Work with the top 1% of translators in the world.

   Translate where your domain expertise is appreciated, thanks to our
   predictable project pipeline, fast payment terms, and industry-leading
   software. Oh, and you’ll never be asked to post-edit ever again.
   Apply Here
      Get Paid Faster
      Predictable Work
      Better Technology

   [placeholder_01.a025896.21324f8.jpg]

As Seen In

   [2000px-TheEconomistLogo.67438d8.png]

January 15, 2017

Machine Translation: Beyond Babel

   View Article
   [wsj.5aaaebc.png]

August 19,2018

Models Will
Run The World

   View Article
   [2000px-BBC_World_Service_red.c53f8ea.png]

May 27, 2018

From Language to Algorithm

   View Article
   [wired.aeee8ce.png]

November 15, 2017

Welcome To The Era of The AI Co-worker

   View Article
   [inc_magazine_logo.87317f9.png]

January 28, 2018

20 of Mark Benioff’s Best Startup Investments

   View Article

Ready to get started?

   Let us show you how Lilt can bring your products and services to new
   countries.
   
   Request a Demo
      System Status
      Support
      Terms
      Privacy

     
   Careers

   2019 Lilt, Inc.
</file>
<file1-33.txt>
   Skip to main content

   (BUTTON) Toggle navigation
      EN 601.468/668 Machine Translation

      Syllabus
      Homework
         1. 1. Quality of Machine Translation
         2. 2. Word Alignment
         3. 3. Decoding
         4. 4. Neural Machine Translation
         5. 5. Neural Machine Translation
      Final Project
      Language in 10 minutes

   EN 601.468/668 Machine Translation
          Fall 2018
          Tuesdays and Thursdays 1:30-2:45
          Ames 234
          Computer Science Department
          Johns Hopkins University

   Google translate instantly translates between any pair of over eighty
   human languages like French and English. How does it do that? Why does
   it make the errors that it does? And how can you build something
   better? Modern translation systems like Google Translate and Bing
   Translator learn to translate by reading millions of words of already
   translated text. This course will show you how they work. We cover
   fundamental building blocks from linguistics, machine learning
   (especially deep learning), algorithms, and data structures, showing
   how they apply to a difficult real-word artificial intelligence
   problem.

   Instructor
          Philipp Koehn (phi@jhu.edu)

   TA
          Huda Khayrallah (huda@jhu.edu)
          Brian Thompson (brian.thompson@jhu.edu)
          Tanay Agarwal (tagarwa2@jhu.edu)

   Office hours
          Professor by Appointment
          TAs Monday 10-12, Barton 225; Tuesday 10:30-11:30, Malone
          Undergraduate Lab

   Discussion Forum
          Piazza

   Textbooks
          The class follows closely two textbooks.

          + Statistical Machine Translation (errata) by Philipp Koehn,
            2010. You can read it online through the JHU library or
            purchase from Amazon.
          + Neural Machine Translation, by Philipp Koehn, 2019. A draft
            copy of the book will be distributed by email. Contact the
            professor to receive a copy.

   Grading
          To understand how machine translation works, you will build a
          translation system. We will mainly grade hands-on work.

      Five homework assignments (12% each)
      Final project (30%)
      In-class presentation: Language in ten minutes (10%)

   Homework Schedule
          There will be five homework assignments, tentative schedule:

          + HW1: Analysis, due September 13
          + HW2: Word alignment, due September 27
          + HW3: Decoding, due October 11
          + HW4: Neural translation model part 1, due October 25
          + HW5: Neural translation model part 2, due November 8

   Late penalty for homework assignments: 10% per day.
     __________________________________________________________________

   Last updated November 30, 2018.
   Created with git, jekyll, bootstrap, and vim.
   Feel free to reuse the source code .
</file>
<file1-34.txt>
    Matecat » Feed Matecat » Comments Feed Matecat » iCal Feed alternate
   alternate

   (BUTTON)
   Matecat
      Benefits
      Outsource
      Plans
      About
      FAQ
      Support
      Webinar

Machine Translation Engines

   Support > Managing Language Resources > Machine Translation Engines

   In MateCat, the best option when creating a project is to select
   MyMemory, which uses a combination of Google Translate and Microsoft
   Translator to provide machine translation suggestions. You can also
   disable machine translation suggestions by unchecking the corresponding
   box under the field “Use in this project” in the Machine Translation
   tab.

   You can also connect your machine translation engines provided by
   MMT, Microsoft Translator Hub, IPTranslator from Iconic, Tilde MT,
   Apertium, AltLang, Yandex.Translate, Tauyou, SmartMate and Deeplingo or
   your own Moses engines directly from the MateCat online CAT tool. All
   you need are the credentials granted by your machine translation
   provider.

   In order to enable them, click on Options on the home page, then on Add
   MT engine and select the engines from the dropdown menu Machine
   Translation. The same steps can be taken from the Language Resources
   panel.
   Find out more on this topic in the specific section of the FAQ.


   Was this article ful?
   10 5

In this topic

      Manage Your Language Resources
      Exporting Private Translation Memories
      How to Add a Glossary
      Public TM and Translation Memory Key (TM Key)
      TM Backup and Updates
      Translation Memory and Machine Translation
      Machine Translation Engines

All topics

      Introducing MateCat
      Creating Projects
      Analysing
      Outsourcing to Translated with MateCat
      Translating Projects
      Revising Projects
      Managing Language Resources
      Advanced Features

Get free support

   Click on the green Get Support button and send us an email.
   We’ll get back to you in a few working hours Monday to Friday, 9.00 am
   to 7.00 pm GMT+1.

Still in doubt?

    your question
   [master_matecat.png]

Connect

     
   MateCat is an enterprise-level, online CAT tool which makes
   post-editing and outsourcing easy

Enterprise Users? Contact Us

   MateCat is used by large enterprises not just as a CAT tool, but also
   as a platform to build innovative services and tools. We provide
   software customization, hosting, dedicated support etc. for companies,
   organizations and translation agencies with specific requirements.
   Contact us for more information.

 
</file>
<file1-35.txt>
 

   How to Use Machine Translation to Localize UGC for Global Websites How
   to Use Machine Translation to Localize UGC for Global Websites

   Is the use of machine translation evil for SEO?

   In terms of global website content translation or localization, the
   best practice is to have content localized professionally by a native
   speaker.

   However, just like everything else, there’s a best practice, and
   there’s the reality of conducting business.

   So, what is the reality of running a global website?

   How does the best practice apply – or not apply – especially when it
   comes to user-generated content (UGC)?

The Challenge of Content Localization for Global Websites

   One of the real-life situations that businesses deal with is the
   challenge of increasing user engagement without negatively impacting
   SEO performance.

   Site owners agonize over following the best practices for their fixed
   content on the website, but due to the speed and/or the costs of
   professional translation, oftentimes, it prohibits them to apply this
   best practice to UGC translation.

   Because of this challenge, I often see global websites showing UGC left
   in English or the source language on their local sites because they are
   trying to follow the SEO best practice.

   I understand that website owners are concerned about the SEO
   implications of machine translation.

   However, when content is not translated into the local language, it
   won’t  site visitors or website owners.

   Let’s go through this challenge step by step to see if we can find some
   middle ground.

Selecting Content for Machine Translation

   Before we deep dive into the topic, I’d like to clarify that this
   article is specific to user-generated content, and not the entire
   website.

   Fixed content should always be translated and localized professionally
   by humans without exception.

   Page headers and commonly used text, such as column labels, should also
   be localized and checked by humans.

   If you don’t want UGC to rank well in the search results or even be
   indexed by search engines, that is the safest area to implement machine
   translation.

   The user comments, feedback, reviews, etc. which are not the main
   content of the page can easily be handled with machine translation.

   Even if the translation is not perfect, it would provide ful
   information to site visitors when they can read it in their languages.

   How to Use Machine Translation to Localize UGC for Global Websites How
   to Use Machine Translation to Localize UGC for Global Websites

   When the UGC is on the pages you wish to be indexed by the search
   engines and perform well in the organic search results, you need to
   determine the best translation solution.

Crowdsourced Translation

   This is not machine translation, but another option that some websites
   use to localize their content. It usually has a database of words,
   which participants access to add the words in other languages.

   It’s a low-cost solution when you have volunteers to do the translation
   work. Wikipedia probably is the largest global website using this
   solution.

   Because it depends on crowd participation, it comes with some concerns.
      It is difficult to maintain the quality of the translation.
      Some languages may take much longer to generate a large enough
       database to translate content. This becomes a bigger issue when the
       source language is not one of the more widely spoken/read
       languages.

   Some machine translation tools let you create a glossary database by
   words and phrases translated by crowdsourced translation.

   Below is an example of a clearly wrong word showing up in Google’s
   Translation Tool.

   When a Japanese word for “mischievous” was entered, it gave an
   incorrect translation in English. (The translation has been corrected
   since then.)

   Translation of Mischievous from Japanese to English Translation of
   Mischievous from Japanese to English

   In order to control the quality of the translation and minimize
   problems, I suggest that you control who can contribute to the
   translation project by giving tool access only to trusted editors.

The Advancement of Machine Translation with AI

   As machine translation technology has advanced with AI, some websites –
   including large global websites such as  – are implementing the
   Neural Machine Translation (NMT).

   On the  site, you can see this real-time, text-to-text
   translation working on posts and comments. On their “code.fb.com” site,
   they state:

     “We have just started being able to use more context for
     translations. Neural networks open up many future development paths
     related to adding further context, such as a photo accompanying the
     text of a post, to create better translations.

     We are also starting to explore multilingual models that can
     translate many different language directions. This will  solve
     the challenge of fine-tuning each system relating to a specific
     language pair, and may also bring quality gains from some directions
     through the sharing of training data.”

   Other companies, including Google and Microsoft, also offer NMT
   solutions for websites and other translation needs.

   In addition to text translation, Microsoft developed the Automatic
   Speech Recognition (ASR) for audio speech translation currently used
   for Skype.

   How to Use Machine Translation to Localize UGC for Global Websites How
   to Use Machine Translation to Localize UGC for Global Websites

Improve the Quality of the Translation

   Even with certain advancements, the fact is that machine translation is
   not perfect just yet.

   That said, machine translation quality has improved significantly,
   especially for Western languages.

   The following are some things you can do to ensure the quality of the
   translation:
      Create a list of commonly used words (e.g., categories, tags,
       product names, other keywords). Get them translated professionally
       or even in-house. Upload the list to the translation engine.
      Spot check the translation from time to time to ensure the quality
       of translated content.
      Add online dictionary using their API.
      B2B Industry specific machine translation can handle
       industry-specific jargon and words better.

Optimize the Machine Translation Engine

      Integrate translation management system (TMS) environments for
       machine translation engine implementation.
      Customize the machine translation engine for the content type.
      Create training data for AI and machine learning.

   Still concerned about using the machine translation in terms of SEO?

   Here’s a comment on machine-translated content by Google’s John Muller:

     “I think the kind of the improvements that are happening with
     regards to automatically translated content… It could also be used
     by sites that are legitimately providing translations on a website
     and they just start with like the auto translated version and then
     they improve those translations over time.

     So that’s something where I wouldn’t necessarily say that using
     translated content like that (spamming content) would be completely
     problematic but it’s more a matter of the intent and kind of the
     bigger picture what they’re doing.”

   Many websites already use machine translation for their global sites.
   Their content is indexed and could perform well by providing quality
   content for their local audiences.

   Indeed, it comes back to the “intent” Mueller spoke about.

   Translating UGC to provide informative content to your local audience
   falls under “a good intent.”

Conclusion

   Machine translation could be a great solution for some global websites,
   specifically for handling large volumes of user-generated content.

   Making reviews and comments available in different languages can
   significantly increase visitor satisfaction, engagement, and (most
   importantly) sales.

   Don’t let broad standards keep you from serving your consumers. Review
   the following and make the best decision for your business.
      Determine the content on your site that is appropriate for machine
       translation.
      Select the translation solution that works best for your website
       content.
      Optimize the machine translation engine by adding industry-specific
       terms, keywords, etc.
      Create training data for AI.
      Monitor the quality of the translation.

   More Resources:
      5 Content Management Tips for Global Websites
      A Quick Guide to Getting Started in International SEO
      A Complete Guide to SEO
     __________________________________________________________________

 
</file>
<file1-36.txt>
   IFRAME: //www.googletagmanager.com/ns.html?idGTM-KCGXRMR

   Skip to main content

MIT Press 

   Search Menu Close Menu
    

Machine Translation

   By Thierry Poibeau
   A concise, nontechnical overview of the development of machine
   translation, including the different approaches, evaluation issues, and
   major players in the industry.
   Paperback $15.95 T £11.95

    Add to Cart Buying Options

Buying Options

     
Buy
          +

Paperback
            $15.95 | £11.95 Toggle Dropdown
            ISBN: 9780262534215 296 pp. | 5 in x 7 in 28 b&w illus.
            September 2017
          +
          +

Amazon.com
            Buy
          +

Barnes & Noble
            Buy
          +

IndieBound
            Buy
          +

Indigo
            Buy
          +

Powell's
            Buy
          +

Waterstones
            Buy

   Close Drawer
   Request Permissions

Online Attention

   Mouseover for Online Attention Data

Overview

Author(s)

Summary

   A concise, nontechnical overview of the development of machine
   translation, including the different approaches, evaluation issues, and
   major players in the industry.

   The dream of a universal translation device goes back many decades,
   long before Douglas Adams's fictional Babel fish provided this service
   in The Hitchhiker's Guide to the Galaxy. Since the advent of computers,
   research has focused on the design of digital machine translation
   tools—computer programs capable of automatically translating a text
   from a source language to a target language. This has become one of the
   most fundamental tasks of artificial intelligence. This volume in the
   MIT Press Essential Knowledge series offers a concise, nontechnical
   overview of the development of machine translation, including the
   different approaches, evaluation issues, and market potential. The main
   approaches are presented from a largely historical perspective and in
   an intuitive manner, allowing the reader to understand the main
   principles without knowing the mathematical details.

   The book begins by discussing problems that must be solved during the
   development of a machine translation system and offering a brief
   overview of the evolution of the field. It then takes up the history of
   machine translation in more detail, describing its pre-digital
   beginnings, rule-based approaches, the 1966 ALPAC (Automatic Language
   Processing Advisory Committee) report and its consequences, the advent
   of parallel corpora, the example-based paradigm, the statistical
   paradigm, the segment-based approach, the introduction of more
   linguistic knowledge into the systems, and the latest approaches based
   on deep learning. Finally, it considers evaluation challenges and the
   commercial status of the field, including activities by such major
   players as Google and Systran.

Paperback

   $15.95 T | £11.95 ISBN: 9780262534215 296 pp. | 5 in x 7 in 28 b&w
   illus. September 2017

Share

      Share
     

Authors

Thierry Poibeau

   Thierry Poibeau is Director of Research at the Centre National de la
   Recherche Scientifique in Paris, Head of the LATTICE (Langues, Textes,
   Traitements InformatiquesCognition) Laboratory, and Affiliated
   Lecturer in the Department of Theoretical and Applied Linguistics at
   the University of Cambridge.

Other Books in this Series See More

      Food
    Food
       Fabio Parasecoli
       Buying Options
      Sexual Consent
    Sexual Consent
       Milena Popova
       Buying Options

You might also like

      Haptics
    Haptics
       Lynette Jones
       Buying Options

MIT Press

Footer

Books

Journals

Blog

Podcasts


</file>
<file1-37.txt>
    alternate alternate alternate alternate alternate alternate alternate
   Unbabel » Feed

   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-M77VLBR

   (BUTTON)
      Solutions
       Customer Service Increase customer satisfaction, cut down response
       times, and build a more efficient operation.
       Unbabel for Zendesk Get multilingual with Zendesk Support, Chat and
       Guide
       Unbabel for Freshdesk Deliver customer support in 28 languages on
       Freshdesk
       Unbabel for Salesforce Seamless translation solutions for Service
       Cloud, Knowledge, and Live Agent
       Unbabel for Video Your one-stop shop for high quality
       transcription, translation and subtitling
      APIs
       Translation API
       Video API
      Pricing
      Blog
      Request a demo
      Become a translator

The world’s only human-quality
translation pipeline

   Request demo API documentation

Trusted by

      Microsoft
      Change.org
      Pinterest
      Soundcloud
      Under Armour
      King

AI + Human Translation API

   Get human-quality translations of your content piped where you need it

Continuous translation

   Unbabel can translate all your content seamlessly

50,000+ editor community

   AI assisted Human Translators around the globe translate your content

Final human touch

   Human quality is enforced by skilled professionals before delivery

AI, Glossaries and Style Guides

   AI assisted guidance ensures translation quality and speed in every
   step

   [unbabel-apipage-usp-01.svg]

State-of-the-art Translation

   Unbabel employs custom Machine Translation engines using
   state-of-the-art Neural Machine Translation (NMT) adapted to our
   customers’ domains.

   [unbabel-apipage-usp-02.svg]

50,000+ editor community

   We work with a Community of professional translators and native
   speakers. They’re on the move, around the globe, working on the Unbabel
   Platform on their computers and mobile phones.

   [unbabel-apipage-usp-03.svg]

Glossaries & style guides

   Customer glossaries and style guides assure quality and consistency
   with your brand’s voice in every translation.

   [unbabel-apipage-usp-04.svg]

The world’s best Quality Estimation system

   Unbabel has the world’s most state of the art Quality Estimation
   system, winning multiple shared tasks at the Workshop on Machine
   Translation by wide margins. We use it to rank our translations and to
   identify incorrect words for our editors to pay special attention to.

   [unbabel-apipage-usp-05.svg]

Better than out-of-the-box Google, Microsoft and Yandex

   We incorporate customer-specific training data, machine translation
   engines adapted by content type, and a host of machine learning
   algorithms to beat out-of-the-box MT solutions from some of the biggest
   names in tech.

   [unbabel-apipage-usp-06.svg]

Developers are welcome

   With a fully functional SDK for Python, and SDKs for Ruby and PHP in
   development, you can put Unbabel to work for you as quickly as
   possible. Learn more.

See it in action

   IFRAME: https://www.youtube.com/embed/zGQOLW9KJxo?rel0&showinfo0

Use Cases

   The Unbabel Translation API seamlessly integrates with your workflows,
   business processes, websites, apps, comms platforms and more.

Your platform, multilingual

   Build multilingual platform integrations, like we’ve done for
   Salesforce, Zendesk and more.

CMS translation on the go

   Translate within your CMS so you can easily publish new content in
   multiple languages.

Build and partner with Unbabel

   Develop an Unbabel Integration with another platform and list it on our
   Marketplace.

Reach customers in their native
language with Unbabel

   Request demoAPI documentation

Company

      About
      Publications
      Careers
      Press & Media
      Portugal 2020 – Project 10432
      Portugal 2020 – Project 027767

Why Unbabel

      Translation Quality
      Translation Speed
      Languages
      Customers
      Translators
      Developers

Use Cases

      Multilingual Customer Service
      Multilingual Live Chat

Support

      Customer Support
      Translator Support
      API Documentation
      Terms of Service
      Privacy Policy
      Contact

      English
          + Español
          + Português
          + Italiano
          + Deutsch
          + Français
          + 简体中文

     
   Unbabel

   Building universal understanding
</file>
<file1-38.txt>
  
Global Content & Language Solutions - AMPLEXOR Blog | Machine Translation

Machine Translation |

Je veux lirepropos de

Translation |

Translation Management |

Localization |

Language Services |

Globalization |

Internationalisation |

Le rôle de la technologie de traduction dans une communication réussie

     by Kristina Bauer  |  4min lis  |  11/05/17
   Lire la suite

Abonnez-vousnotre blog

TOPICS

      CAT Tools (1)
      Centralisation (1)
      Collaboration (1)
      Content Creation (1)
      contenu dans le secteur industriel (1)
      Coûtservices linguistiques (1)
      Coûts de traduction (1)
      Customer Experience (1)
      Différences culturelles (1)
      Gestion de la Traduction (1)
      gestion du programme de traduction (1)
      Globalization (3)
      Internationalisation (2)
      L'intelligence artificielle (1)
      Language Services (4)
     tendances de l'AI (1)
     étapes de l'IA (1)
      localisation (2)
      Localisation de sites Internet (1)
      Localization (5)
      Machine Translation (1)
      Marketing Global (1)
      Multilingual Content (2)
      programme de traduction (2)
      Project Management (1)
      Qualité de la traduction (1)
      qualité de traduction (1)
      Règlement PRIIP (1)
      SEO Multilingue (1)
      Stratégie de Localisation (2)
      Terminologie (2)
      Terminology (1)
      Traduction (1)
      Traduction Automatique (2)
      Traduction commerciale (1)
      traduction créative (1)
      traduction de contenu (1)
      Traduction de logiciels (1)
      Traduction de sites Internet (2)
      Transcréation (2)
      Translation (7)
      Translation Industry (2)
      Translation Management (6)
      Translation Memory (1)
      Translation Strategy (1)
      Translation Technology (1)
      validation dansmarchés (1)
      Website Optimization (1)

   More topics


</file>
<file1-39.txt>
   alternate alternate alternate alternate alternate alternate alternate
 

Pure Neural™ Machine Translation
Le moteur de traduction neuronale de SYSTRAN

   Testez-le - En savoir plus

Lʼintelligence artificiellele deep learning
appliqués au traitementlangues

Traduction neuronale : revenons aux origines

  mots "Deep Learning" ou apprentissage profond ou encore réseaux de
   neurones artificiels, ne vous sont sûrement pas inconnus. Nous sommes
   nombreuxavoir utilisé sans le savoirsolutions basées sur cette
   technologie comme la reconnaissance dʼimages, lʼanalyse de big data et
  assistants virtuels quegéants du Web ont intégrésleurs
   services.

   Plus récemment, de nombreuseses recherches ont été menées sur lʼapport
   de ces nouvelles technologies dans le traitement de la langue. Les
   résultats de ces recherches sont partagés au sein dʼune communauté open
   source dans laquelle SYSTRANtrès activepartage ses
   connaissances.

Une machine auto-apprenante

   Contrairement aux technologies jusquʼalors utilisées sur le marché
   (statistiqueà base de règles),moteur neuronal traite la
   totalité du processus de traduction automatique au travers dʼun unique
   réseau de neurones artificiels.

  réseau de neurones artificielscomposé de plusieurs couches qui
   sont connectées entre elles avecpoids différents appelée les
   paramètres du réseau.
  élément clé du réseau de neuronessa capacitécorriger
   automatiquement ses paramètres pendant la phase dʼapprentissage
   (quelques semaines). Concrètement, ce quigénéré en sortie est
   comparéune traduction de référenceen retourcorrectif est
   "rétro-propagé" pour ajusterpoidsaffiner le paramétrage des
   connections du réseau.

   Cette technologie qui se base suralgorithmes complexesla pointe
   du Deep Leraning (ou apprentissage profond) permet au moteur PNMT™
   (Pure Neural™ Machine Translation) dʼapprendre, de générerrègles
   dʼune languepartir dʼune traduction de référencede produire une
   traduction dont la qualité dépasse lʼétat de lʼartsʼavère meilleure
   que celle dʼune personne non native de la langue.

Demander une demo


   Nous respectons la confidentialité de vos informationsnous ne les
   utiliserons que dans le cadre de nos échanges.
   Comment mettre en place avec succèstechniques de managementflux
   tenduéchelle mondiale
   PNMT page
   form-pnmt-page-request

Aiderentreprisesréussir dans lʼère de lʼinstantanéité

   Pure Neural Machine Translation Démonstrateur

   A lʼère du digital, la barrière de la langue représente jusquʼà ce jour
   unplus grands défis pour déployer rapidement une stratégie
   commerciale internationale.entreprises ont aujourdʼhui
   lʼopportunité de toucher plus facilement de nouveaux marchés grâce aux
   derniers progrès de lʼintelligence artificielleaux avancées de la
   recherchedéveloppement en traduction automatique.

   Avec cette innovation majeure, SYSTRAN poursuit sa quête de
   lʼexcellence technologique dans le but dʼaiderentreprisesles
   organisationsse donnermoyens de réussir dansmonde de
   communication globale avecexigences de disponibilité 24/7de
   réactivité en temps réel. SYSTRAN offre aux organisationsaccèsla
   meilleure qualité de traduction du marché, proche de la fluidité dʼune
   traduction humaineadaptée aux spécificités de chaque clientà
   son domaine (légal, automobile, IT, tourisme...).

  entreprises peuvent ainsi déployer leur stratégie commerciale dans
   plusieurs pays simultanément, en dépassant la barrière de la langue et
   en apportantgains substanciels de productivitéde délai de mise
   sur le marché.

Témoignages

   Témoignages

   « La nouvelle technologie PNMT™ offre une qualitéune fluidité de
   traduction inégalée dans l‘histoire de la traduction automatique. Il
   reste cependantaxes d‘amélioration sur lesquelséquipes R&D
   SYSTRAN sont déjà en train de travailler. Nul doute que cette
   technologie ouvrira de nouvelles perspectives aux traducteursaux
   collaborateurs dansmonde globalisé. »
   Crosslang Heidi Depraetere, fondatrice de Crosslang

   « Le moteur de traduction PNMT™ créé par SYSTRAN estgrand pas pour
   la communication en généralpour le tourisme en particulier. Il
   apportefabuleux champ d‘opportunités, de nouvelles expériences, un
   voyage passionnant en terrelangues ! Le touriste augmenté 2.0 est
   né ! »
   Petit Futé Dominique Auzias, fondateur du Petit Futé

   « Mieux encore, le moteur SYSTRAN PNMTTM comprenait ce que je voulais
   dire,lʼa traduit de manière très fluide. Dans la plupartcas,
   la terminologie était adéquate,les phrases « sonnaient » comme des
   phrases humaines. »
   Lexcelera Lori Thicke, PDG de Lexcelera

La spécialisation décuple le potentiel de la traduction neuronale

   SYSTRANle seul acteur aujourdʼhui capable de spécialisermoteur
   neuronal. Ce savoir-faire unique améliore nettement la qualité de
   traduction danstemps record.

   [pnmt-systran-ceo-jean-senellart.jpg]

   « L‘adaptation de la traduction àdomaine spécifique : juridique,
   marketing, légal, technique…une nécessité absolue pour les
   entreprisesorganisations globales. Offrir aux professionnels des
   solutions de traduction spécialisée dans leur terminologie métier, là
  l‘ADN de SYSTRAN depuis de nombreuses années. La nouvelle
   génération de moteurs neuronaux ouvrent de nouvelles possibilités de
   spécialisation. PNMT™capable dʼadaptermodèle génériquede
   nouvelles donnéesmêmechaque traducteur. La traduction neuronale
   générique apporte sans contestesaut quantique dans lʼhistoire des
   technologies de traduction, mais la traduction neuronale spécialisée
  celle qui permettra réellement aux organisations dʼatteindre leurs
   objectifslʼinternational. »

   Jean Senellart, Directeur Technique de SYSTRAN

Ressources


Trouverrevendeur Desktop

  revendeur SYSTRANà votre service dans votre
   région pour trouver la solution Desktop qui vous convient.
   Trouverrevendeur Desktop

Suivez-nous

     
   Nous respectons la confidentialité de vos informationsnous ne les
   utiliserons que dans le cadre de nos échanges.

   Politique relative au respect de la vie privée - Conditions
   dʼUtilisationServices de SYSTRAN - Copyright 2019 SYSTRAN All
   rights reserved - Traduction en ligne, logiciel de traductionoutils
   de traduction: Traduction de texte, de pages web, de fichiers.
   Dictionnaire multilingue en ligne. Logiciel de traduction disponible en
   Anglais, Français, Italien, Allemand, Portuguais, Espagnol,
   Néerlandais, Grec, Chinois, Japonais, Coréen, Russe, Polonais, Arabe et
   Suédois.
   Traduction de messagerie instantanée (MS Lync). www.systranlinks.com

   Touschamps sont requis


</file>
<file1-4.txt>
   alternate Appen » Feed Appen » Comments Feed alternate alternate

Appen

   High-quality training data for machine learning, enhanced by human
   interaction Navigation
      ABOUT
          + Learn about Appen
          +
               o Leadership
               o Investors
          +
               o Careers
               o Locations
          +
               o Awards & Recognition
               o Contact Us
          + [blog_menu_img.png]
      INDUSTRIES
          + Industries we serve
          +
               o Technology
               o Retail
          +
               o Automotive
               o Healthcare
          +
               o Government
               o Financial Services
          + [events.png]
      SOLUTIONS
          + Solutions we improve
          +
               o Applications
               o Automatic Speech Recognition
               o CEM/CRM
               o Computer Vision
               o Data Analytics
               o eCommerce
          +
               o Fraud Detection
               o In-car Infotainment
               o In-car Navigation
               o Machine Translation
               o Medical Imaging
               o Risk Management Models
          +
               o Proofing Tools
               o Search Relevance
               o Semantic Search
               o Social Media
               o Social Media Analytics
               o Text-to-Speech
               o Virtual Assistants and Chatbots
          + [machine-learning-menu-banner.png]
      SERVICES
          + Services we offer
          +
               o Annotation
               o Data Annotation
               o Linguistic Annotation
               o Semantic Annotation
               o Collection
               o Image and Video Data
               o Speech Data
               o Text Data
               o Consultative Services
          +
               o Content Moderation
               o Field Testing
               o Linguistics
               o Language Technology QA
               o Lexicons and Word Lists
               o Linguistic Consulting
               o Linguistic Rule Development
               o Personalization
          +
               o Search Relevance
               o Secure Services
               o Transcription
               o Secure
               o Speech Data
               o Translation and Localization
          + [solutuons_menu_image.png]
      Find flexible jobs

      ABOUT
          + Learn about Appen
          +
               o Leadership
               o Investors
          +
               o Careers
               o Locations
          +
               o Awards & Recognition
               o Contact Us
          + [blog_menu_img.png]
      INDUSTRIES
          + Industries we serve
          +
               o Technology
               o Retail
          +
               o Automotive
               o Healthcare
          +
               o Government
               o Financial Services
          + [events.png]
      SOLUTIONS
          + Solutions we improve
          +
               o Applications
               o Automatic Speech Recognition
               o CEM/CRM
               o Computer Vision
               o Data Analytics
               o eCommerce
          +
               o Fraud Detection
               o In-car Infotainment
               o In-car Navigation
               o Machine Translation
               o Medical Imaging
               o Risk Management Models
          +
               o Proofing Tools
               o Search Relevance
               o Semantic Search
               o Social Media
               o Social Media Analytics
               o Text-to-Speech
               o Virtual Assistants and Chatbots
          + [machine-learning-menu-banner.png]
      SERVICES
          + Services we offer
          +
               o Annotation
               o Data Annotation
               o Linguistic Annotation
               o Semantic Annotation
               o Collection
               o Image and Video Data
               o Speech Data
               o Text Data
               o Consultative Services
          +
               o Content Moderation
               o Field Testing
               o Linguistics
               o Language Technology QA
               o Lexicons and Word Lists
               o Linguistic Consulting
               o Linguistic Rule Development
               o Personalization
          +
               o Search Relevance
               o Secure Services
               o Transcription
               o Secure
               o Speech Data
               o Translation and Localization
          + [solutuons_menu_image.png]
      Find flexible jobs

   Solutions Machine Translation
   [Press Release] Appen Launches New China Website
      ____________________ (BUTTON)
      Blog
      Events
      Investors
      Resources
      Contact Us
      [linkedin.png] [fb.png] [twitter.png] [youtube.png]
      EN [english.png]
       [english.png] English [chinese.png] 中文

Machine Translation

   Drive higher customer satisfaction with automatic translation
   capabilities that are highly accurate

   Build more robust automatic translation systems with high-quality data
   [Solutions_header-Machine-Translation.png]

How we 

   Systems that rely on machine translation require high quality speech
   and text data to produce accurate results. Often it is challenging to
   find the resources needed to supply your system with enough quality
   data in all of your target markets. Working with an experienced partner
   can greatly accelerate your time to market and can result in a system
   that builds stronger customer satisfaction.
   [Machine_Translation.png]

Our approach

   Appen can  you customize your machine translation engines with a
   range of services from domain-specific training and test sets,
   post-editing, machine translation evaluation and linguistic services.
   Our skilled project managers work with your team to understand your
   objectives and timeline, and will customize a program to meet your
   needs

   [Appen-Solutions-why_appen_circle.png]

Why Appen?

   For over 20 years, Appen has worked with companies around the world to
   improve their speech and machine learning-based solutions by
   providing-high quality, human-annotated data. With coverage for over
   180 languages and dialects, we can  you reach more customers around
   the globe.

Our data services
     __________________________________________________________________

      Consultative Services
       We work closely with your team to develop a customized program that
       addresses your unique business challenges.
      Language Technology QA
       Develop top notch language-based solutions with language quality
       assurance services.
      Lexicons and Word Lists
       Use custom lexicons and word lists to ensure the accuracy of your
       speech and text-based systems.
      Linguistic Consulting
       Ensure your solutions meet the needs of customers worldwide with
       the  of expert linguists.
      Speech Data Collection
       Use our curated global crowd to collect high quality speech data in
       over 180 languages and dialects.
      Text Data Collection
       Collect millions of high quality data samples to ensure your
       solution meets the needs of your customers worldwide.
      Translation and Localization
       Traditional translation and machine translation services from
       language and data experts.

Additional resources
     __________________________________________________________________

   Appen Recognized Among Largest Language Service Providers in the World

Appen Recognized Among Largest Language Service Providers in the WorldAppen
Ranked One of the Largest LSPs

   Insights from Conversational Interaction 2018: NLP, Chatbots &
   Comedians

Insights from Conversational Interaction 2018: NLP, Chatbots, and
ComediansConversational Interaction 2018

   AI Requires a Human Touch_Appen Crowdsourcing_Crowd Sourced Data

AI Requires a Human Touch: How Appen Recruits Crowds to Improve TechnologyHow
Appen Recruits to Support AI

Appen Off the Shelf Linguistic Resources

   Quickly expand your products into new markets with licensed language
   data.
   Gain immediate access to a complete speech and language database to
   accelerate your product development efforts.
   Learn more

Improve your training data

   Scale your machine learning with high-quality training data. Talk with
   one of our experts today.
     __________________________________________________________________

   Contact us

   [logo.png]
   Appen is a global leader in the development of high-quality,
   human-annotated datasets for machine learning and artificial
   intelligence.
   [linkedin.png] [fb.png] [twitter.png] [youtube.png]

Australia

   Corporate Headquarters
   Level 6, 9  Street
   Chatswood NSW 2067
   +61-2-9468-6300

China

   Office 712, 7/F Metropolis Tower,
   No.2 Haidian Dongsan Street
   Zhongguancun Xi Zone, Haidian District
   Beijing, China
   +86-181-4650-3673

Philippines

   BPO building 1 Suntech iPark
   Lancaster New City, Barangay Alapan II
   Imus Cavite, Philippines
   3/F Metro Lifestyle Complex
   F. Torres St & E. Jacinto Ext
   Davao City, Philippines 8000

United Kingdom

   Rockeagle House, Pynes Hill
   Exeter, EX2 5AZ, Devon
   +44-1392-213-958
   Visit Appen UK

United States

   Toll free: + 1-866-673-6996
   From outside the US: + 1-646-224-1146

Seattle

   US Headquarters
   12131 113th Ave NE Suite 100
   Kirkland, WA 98034-6944

Detroit

   27280 Haggerty Rd. Suite C-8
   Farmington Hills, MI 48331

San Francisco

   999 5th Ave. Suite 570
   San Rafael, California, 94901

Pleasanton

   5050 Hopyard Rd. Suite 425
   Pleasanton, California 94588

   Copyright 2018. Appen Limited  Privacy Statement | Data Subject Access
   Request

   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-WL36V8J
</file>
<file1-40.txt>
  

A history of machine translation from the Cold War to deep learning

   Go to the profile of Ilya Pestov
   Ilya Pestov  BlockedUnblock  FollowFollowing
   Mar 12, 2018
   [1bA4rvF1PhR3cq5vkVYMxow.jpeg]
   Photo by Ant Rozetsky on Unsplash

   I open Google Translate twice as often as , and the instant
   translation of the price tags is not a cyberpunk for me anymore. That’s
   what we call reality. It’s hard to imagine that this is the result of a
   centennial fight to build the algorithms of machine translation and
   that there has been no visible success during half of that period.

   The precise developments I’ll discuss in this article set the basis of
   all modern language processing systems — from search engines to
   voice-controlled microwaves. I’m talking about the evolution and
   structure of online translation today.
   [1SvP9QNT2zekXzfkiInIwbA.png]
   The translating machine of P. P. Troyanskii (Illustration made from
   descriptions. No photos left, unfortunately.)

In the beginning

   The story begins in 1933. Soviet scientist Peter Troyanskii presented
   “the machine for the selection and printing of words when translating
   from one language to another” to the Academy of Sciences of the USSR.
   The invention was super simple — it had cards in four different
   languages, a typewriter, and an old-school film camera.

   The operator took the first word from the text, found a corresponding
   card, took a photo, and typed its morphological characteristics (noun,
   plural, genitive) on the typewriter. The typewriter’s keys encoded one
   of the features. The tape and the camera’s film were used
   simultaneously, making a set of frames with words and their morphology.
   [1BlAQDuIH7SkxbP4U_cjWcA.png]

   Despite all this, as often happened in the USSR, the invention was
   considered “useless”. Troyanskii died of Stenocardia after trying to
   finish his invention for 20 years. No one in the world knew about the
   machine until two Soviet scientists found his patents in 1956.

   It was at the beginning of the Cold War. On January 7th 1954, at IBM
   headquarters in New York, the Georgetown–IBM experiment started. The
   IBM 701 computer automatically translated 60 Russian sentences into
   English for the first time in history.

     “A girl who didn’t understand a word of the language of the Soviets
     punched out the Russian messages on IBM cards. The “brain” dashed
     off its English translations on an automatic printer at the
     breakneck speed of two and a half lines per second,” — reported the
     IBM press release.

   [1DJrNvaXxrUNXqH-tglpm6w.jpeg]
   IBM 701

   However, the triumphant headlines hid one little detail. No one
   mentioned the translated examples were carefully selected and tested to
   exclude any ambiguity. For everyday use, that system was no better than
   a pocket phrasebook. Nevertheless, this sort of arms race launched:
   Canada, Germany, France, and especially Japan, all joined the race for
   machine translation.

The race for machine translation

   The vain struggles to improve machine translation lasted for forty
   years. In 1966, the US ALPAC committee, in its famous report, called
   machine translation expensive, inaccurate, and unpromising. They
   instead recommended focusing on dictionary development, which
   eliminated US researchers from the race for almost a decade.

   Even so, a basis for modern Natural Language Processing was created
   only by the scientists and their attempts, research, and developments.
   All of today’s search engines, spam filters, and personal assistants
   appeared thanks to a bunch of countries spying on each other.
   [1d-iF6wcVYCWFDLkghpJvkw.png]

Rule-based machine translation (RBMT)

   The first ideas surrounding rule-based machine translation appeared in
   the 70s. The scientists peered over the interpreters’ work, trying to
   compel the tremendously sluggish computers to repeat those actions.
   These systems consisted of:
      Bilingual dictionary (RU -> EN)
      A set of linguistic rules for each language (For example, nouns
       ending in certain suffixes such as -heit, -keit, -ung are feminine)

   That’s it. If needed, systems could be supplemented with hacks, such as
   lists of names, spelling correctors, and transliterators.
   [1_xwoE70TZYYstf4P8DWbjg.png]

   PROMPT and Systran are the most famous examples of RBMT systems. Just
   take a look at the Aliexpress to feel the soft breath of this golden
   age.

   But even they had some nuances and subspecies.

Direct Machine Translation

   This is the most straightforward type of machine translation. It
   divides the text into words, translates them, slightly corrects the
   morphology, and harmonizes syntax to make the whole thing sound right,
   more or less. When the sun goes down, trained linguists write the rules
   for each word.

   The output returns some kind of translation. Usually, it’s quite
   crappy. It seems that the linguists wasted their time for nothing.

   Modern systems do not use this approach at all, and modern linguists
   are grateful.
   [15ma5py_YcXd9n9GW1QYXhw.png]

Transfer-based Machine Translation

   In contrast to direct translation, we prepare first by determining the
   grammatical structure of the sentence, as we were taught at school.
   Then we manipulate whole constructions, not words, afterwards. This
   s to get quite decent conversion of the word order in translation.
   In theory.

   In practice, it still resulted in verbatim translation and exhausted
   linguists. On the one hand, it brought simplified general grammar
   rules. But on the other, it became more complicated because of the
   increased number of word constructions in comparison with single words.
   [0bDUN2vDxvzQuwbXi.png]

Interlingual Machine Translation

   In this method, the source text is transformed to the intermediate
   representation, and is unified for all the world’s languages
   (interlingua). It’s the same interlingua Descartes dreamed of: a
   meta-language, which follows the universal rules and transforms the
   translation into a simple “back and forth” task. Next, interlingua
   would convert to any target language, and here was the singularity!

   Because of the conversion, Interlingua is often confused with
   transfer-based systems. The difference is the linguistic rules specific
   to every single language and interlingua, and not the language pairs.
   This means, we can add a third language to the interlingua system and
   translate between all three. We can’t do this in transfer-based
   systems.
   [04tMnA3BNQugt1D3O.png]

   It looks perfect, but in real life it’s not. It was extremely hard to
   create such universal interlingua — a lot of scientists have worked on
   it their whole lives. They’ve not succeeded, but thanks to them we now
   have morphological, syntactic, and even semantic levels of
   representation. But the only Meaning-text theory costs a fortune!

   The idea of intermediate language will be back. Let’s wait awhile.
   [0XeULtDZzJF9ajRH9.png]

   As you can see, all RBMT are dumb and terrifying, and that’s the reason
   they are rarely used unless for specific cases (like the weather report
   translation, and so on). Among the advantages of RBMT, often mentioned
   are its morphological accuracy (it doesn’t confuse the words),
   reproducibility of results (all translators get the same result), and
   the ability to tune it to the subject area (to teach economists or
   terms specific to programmers, for example).

   Even if anyone were to succeed in creating an ideal RBMT, and linguists
   enhanced it with all the spelling rules, there would always be some
   exceptions: all the irregular verbs in English, separable prefixes in
   German, suffixes in Russian, and situations when people just say it
   differently. Any attempt to take into account all the nuances would
   waste millions of man hours.

   And don’t forget about homonyms. The same word can have a different
   meaning in a different context, which leads to a variety of
   translations. How many meanings can you catch here: I saw a man on a
   hill with a telescope?

   Languages did not develop based on a fixed set of rules — a fact which
   linguists love. They were much more influenced by the history of
   invasions in past three hundred years. How could you explain that to a
   machine?

   Forty years of the Cold War didn’t  in finding any distinct
   solution. RBMT was dead.

Example-based Machine Translation (EBMT)

   Japan was especially interested in fighting for machine translation.
   There was no Cold War, but there were reasons: very few people in the
   country knew English. It promised to be quite an issue at the upcoming
   globalization party. So the Japanese were extremely motivated to find a
   working method of machine translation.

   Rule-based English-Japanese translation is extremely complicated. The
   language structure is completely different, and almost all words have
   to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto
   University came up with the idea of using ready-made phrases instead of
   repeated translation.

   Let’s imagine that we have to translate a simple sentence — “I’m going
   to the cinema.” And let’s say we’ve already translated another similar
   sentence — “I’m going to the theater” — and we can find the word
   “cinema” in the dictionary.

   All we need is to figure out the difference between the two sentences,
   translate the missing word, and then not screw it up. The more examples
   we have, the better the translation.

   I build phrases in unfamiliar languages exactly the same way!
   [0GC2LYl8IG_Tbrqjp.png]

   EBMT showed the light of day to scientists from all over the world: it
   turns out, you can just feed the machine with existing translations and
   not spend years forming rules and exceptions. Not a revolution yet, but
   clearly the first step towards it. The revolutionary invention of
   statistical translation would happen in just five years.

Statistical Machine Translation (SMT)

   In early 1990, at the IBM Research Center, a machine translation system
   was first shown which knew nothing about rules and linguistics as a
   whole. It analyzed similar texts in two languages and tried to
   understand the patterns.
   [0FvoRKJ59wNWiMGGL.png]

   The idea was simple yet beautiful. An identical sentence in two
   languages split into words, which were matched afterwards. This
   operation repeated about 500 million times to count, for example, how
   many times the word “Das Haus” translated as “house” vs “building” vs
   “construction”, and so on.

   If most of the time the source word was translated as “house”, the
   machine used this. Note that we did not set any rules nor use any
   dictionaries — all conclusions were done by machine, guided by stats
   and the logic that “if people translate that way, so will I.” And so
   statistical translation was born.
   [02YCWdtS_fRU1FiwU.png]

   The method was much more efficient and accurate than all the previous
   ones. And no linguists were needed. The more texts we used, the better
   translation we got.
   [0kh8NtmuiylNGAU9W.png]
   Google’s statistical translation from the inside. It shows not only the
   probabilities but also counts the reverse statistics.

   There was still one question left: how would the machine correlate the
   word “Das Haus,” and the word “building” — and how would we know these
   were the right translations?

   The answer was that we wouldn’t know. At the start, the machine assumed
   that the word “Das Haus” equally correlated with any word from the
   translated sentence. Next, when “Das Haus” appeared in other sentences,
   the number of correlations with the “house” would increase. That’s the
   “word alignment algorithm,” a typical task for university-level machine
   learning.

   The machine needed millions and millions of sentences in two languages
   to collect the relevant statistics for each word. How did we get them?
   Well, we decided to take the abstracts of the European Parliament and
   the United Nations Security Council meetings — they were available in
   the languages of all member countries and were now available for
   download atCorpora and Europarl Corpora.

Word-based SMT

   In the beginning, the first statistical translation systems worked by
   splitting the sentence into words, since this approach was
   straightforward and logical. IBM’s first statistical translation model
   was called Model one. Quite elegant, right? Guess what they called the
   second one?

   Model 1: “the bag of words”
   [0Wx7m2xjZwLKg8kMC.png]

   Model one used a classical approach — to split into words and count
   stats. The word order wasn’t taken into account. The only trick was
   translating one word into multiple words. For example, “Der
   Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it
   would turn out vice versa.

   Here’re some simple implementations in Python: shawa/IBM-Model-1.

   Model 2: considering the word order in sentences
   [0X-XpACu2OIwK1Ib3.png]

   The lack of knowledge about languages’ word order became a problem for
   Model 1, and it’s very important in some cases.

   Model 2 dealt with that: it memorized the usual place the word takes at
   the output sentence and shuffled the words for the more natural sound
   at the intermediate step. Things got better, but they were still kind
   of crappy.

   Model 3: extra fertility
   [0CSPtjUQ-hkkpVeyG.png]

   New words appeared in the translation quite often, such as articles in
   German or using “do” when negating in English. “Ich will keine
   Persimonen” → “I do not want Persimmons.” To deal with it, two more
   steps were added to Model 3.
      The NULL token insertion, if the machine considers the necessity of
       a new word
      Choosing the right grammatical particle or word for each token-word
       alignment

   Model 4: word alignment

   Model 2 considered the word alignment, but knew nothing about the
   reordering. For example, adjectives would often switch places with the
   noun, and no matter how good the order was memorized, it wouldn’t make
   the output better. Therefore, Model 4 took into account the so-called
   “relative order” — the model learned if two words always switched
   places.

   Model 5: bugfixes

   Nothing new here. Model 5 got some more parameters for the learning and
   fixed the issue with conflicting word positions.

   Despite their revolutionary nature, word-based systems still failed to
   deal with cases, gender, and homonymy. Every single word was translated
   in a single-true way, according to the machine. Such systems are not
   used anymore, as they’ve been replaced by the more advanced
   phrase-based methods.

Phrase-based SMT

   This method is based on all the word-based translation principles:
   statistics, reordering, and lexical hacks. Although, for the learning,
   it split the text not only into words but also phrases. These were the
   n-grams, to be precise, which were a contiguous sequence of n words in
   a row.

   Thus, the machine learned to translate steady combinations of words,
   which noticeably improved accuracy.
   [0Sk18CDwMZM8oyV4R.png]

   The trick was, the phrases were not always simple syntax constructions,
   and the quality of the translation dropped significantly if anyone who
   was aware of linguistics and the sentences’ structure interfered.
   Frederick Jelinek, the pioneer of the computer linguistics, joked about
   it once: “Every time I fire a linguist, the performance of the speech
   recognizer goes up.”

   Besides improving accuracy, the phrase-based translation provided more
   options in choosing the bilingual texts for learning. For the
   word-based translation, the exact match of the sources was critical,
   which excluded any literary or free translation. The phrase-based
   translation had no problem learning from them. To improve the
   translation, researchers even started to parse the news websites in
   different languages for that purpose.
   [0WlDBSrqS9s1kk630.png]

   Starting in 2006, everyone began to use this approach. Google
   Translate, Yandex, Bing, and other high-profile online translators
   worked as phrase-based right up until 2016. Each of you can probably
   recall the moments when Google either translated the sentence
   flawlessly or resulted in complete nonsense, right? The nonsense came
   from phrase-based features.

   The good old rule-based approach consistently provided a predictable
   though terrible result. The statistical methods were surprising and
   puzzling. Google Translate turns “three hundred” into “300” without any
   hesitation. That’s called a statistical anomaly.

   Phrase-based translation has become so popular, that when you hear
   “statistical machine translation” that is what is actually meant. Up
   until 2016, all studies lauded phrase-based translation as the
   state-of-the-art. Back then, no one even thought that Google was
   already stoking its fires, getting ready to change our whole image of
   machine translation.

Syntax-based SMT

   This method should also be mentioned, briefly. Many years before the
   emergence of neural networks, syntax-based translation was considered
   “the future or translation,” but the idea did not take off.

   The proponents of syntax-based translation believed it was possible to
   merge it with the rule-based method. It’s necessary to do quite a
   precise syntax analysis of the sentence — to determine the subject, the
   predicate, and other parts of the sentence, and then to build a
   sentence tree. Using it, the machine learns to convert syntactic units
   between languages and translates the rest by words or phrases. That
   would have solved the word alignment issue once and for all.
   [0M65BEFOBrHhm6iOz.png]
   Example taken from the Yamada and Knight [2001] and this great
   slide show.

   The problem is, the syntactic parsing works terribly, despite the fact
   that we consider it solved a while ago (as we have the ready-made
   libraries for many languages). I tried to use syntactic trees for tasks
   a bit more complicated than to parse the subject and the predicate. And
   every single time I gave up and used another method.

   Let me know in the comments if you succeed using it at least once.

Neural Machine Translation (NMT)

   A quite amusing paper on using neural networks in machine translation
   was published in 2014. The Internet didn’t notice it at all, except
   Google — they took out their shovels and started to dig. Two years
   later, in November 2016, Google made a game-changing announcement.

   The idea was close to transferring the style between photos. Remember
   apps like Prisma, which enhanced pictures in some famous artist’s
   style? There was no magic. The neural network was taught to recognize
   the artist’s paintings. Next, the last layers containing the network’s
   decision were removed. The resulting stylized picture was just the
   intermediate image that network got. That’s the network’s fantasy, and
   we consider it beautiful.
   [0SJ5jkim-JzqtCZmC.jpg]

   If we can transfer the style to the photo, what if we try to impose
   another language to a source text? The text would be that precise
   “artist’s style,” and we would try to transfer it while keeping the
   essence of the image (in other words, the essence of the text).

   Imagine I’m trying to describe my dog — average size, sharp nose, short
   tail, always barks. If I gave you this set of the dog’s features, and
   if the description was precise, you could draw it, even though you have
   never seen it.
   [0NBrI8ZZkSUoYYl0D.png]

   Now, imagine the source text is the set of specific features.
   Basically, it means that you encode it, and let the other neural
   network decode it back to the text, but, in another language. The
   decoder only knows its language. It has no idea about of the features’
   origin, but it can express them in, for example, Spanish. Continuing
   the analogy, it doesn’t matter how you draw the dog — with crayons,
   watercolor or your finger. You paint it as you can.

   Once again — one neural network can only encode the sentence to the
   specific set of features, and another one can only decode them back to
   the text. Both have no idea about the each other, and each of them
   knows only its own language. Recall something? Interlingua is back.
   Ta-da.
   [0iK-SDu3fQhnV6y5e.png]

   The question is, how do we find those features? It’s obvious when we’re
   talking about the dog, but how to deal with the text? Thirty years ago
   scientists already tried to create the universal language code, and it
   ended in a total failure.

   Nevertheless, we have deep learning now. And that’s its essential task!
   The primary distinction between the deep learning and classic neural
   networks lays precisely in the ability to search for those specific
   features, without any idea of their nature. If the neural network is
   big enough, and there are a couple of thousand video cards at hand,
   it’s possible to find those features in the text as well.

   Theoretically, we can pass the features gotten from the neural networks
   to the linguists, so that they can open brave new horizons for
   themselves.

   The question is, what type of neural network should be used for
   encoding and decoding? Convolutional Neural Networks (CNN) fit
   perfectly for pictures since they operate with independent blocks of
   pixels.

   But there are no independent blocks in the text — every word depends on
   its surroundings. Text, speech, and music are always consistent. So
   recurrent neural networks (RNN) would be the best choice to handle
   them, since they remember the previous result — the prior word, in our
   case.

   Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing
   the sequence of sounds, where the next depends on the previous),
   keyboard’s tips (memorize the prior, guess the next), music generation,
   and even chatbots.
   [0UOAWQr_t-7HDS9iF.png]

     For the nerds like me: in fact, the neural translators’ architecture
     varies widely. The regular RNN was used at the beginning, then
     upgraded to bi-directional, where the translator considered not only
     words before the source word, but also the next word. That was much
     more effective. Then it followed with the hardcore multilayer RNN
     with LSTM-units for long-term storing of the translation context.

   In two years, neural networks surpassed everything that had appeared in
   the past 20 years of translation. Neural translation contains 50% fewer
   word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar
   mistakes. The neural networks even learned to harmonize gender and case
   in different languages. And no one taught them to do so.

   The most noticeable improvements occurred in fields where direct
   translation was never used. Statistical machine translation methods
   always worked using English as the key source. Thus, if you translated
   from Russian to German, the machine first translated the text to
   English and then from English to German, which leads to a double loss.

   Neural translation doesn’t need that — only a decoder is required so it
   can work. That was the first time that direct translation between
   languages with no сommon dictionary became possible.
   [0ta7lURpkKgMvRkUr.jpg]

Google Translate (since 2016)

   In 2016, Google turned on neural translation for nine languages. They
   developed their system named Google Neural Machine Translation (GNMT).
   It consists of 8 encoder and 8 decoder layers of RNNs, as well as
   attention connections from the decoder network.
   [0j6xn1YgmI6jaMagy.png]

   They not only divided sentences, but also words. That was how they
   dealt with one of the major NMT issues — rare words. NMTs are less
   when the word is not in their lexicon. Let’s say, “Vas3k”. I doubt
   anyone taught the neural network to translate my nickname. In that
   case, GMNT tries to break words into word pieces and recover the
   translation of them. Smart.

     Hint: Google Translate used for website translation in the browser
     still uses the old phrase-based algorithm. Somehow, Google hasn’t
     upgraded it, and the differences are quite noticeable compared to
     the online version.

   Google uses a crowdsourcing mechanism in the online version. People can
   choose the version they consider the most correct, and if lots of users
   like it, Google will always translate this phrase that way and mark it
   with a special badge. This works fantastically for short everyday
   phrases such as, “Let’s go to the cinema,” or, “I’m waiting for you.”
   Google knows conversational English better than I do :(

   Microsoft’s Bing works exactly like Google Translate. But Yandex is
   different.

Yandex Translate (since 2017)

   Yandex launched its neural translation system in 2017. Its main
   feature, as declared, was hybridity. Yandex combines neural and
   statistical approaches to translate the sentence, and then it choose
   the best one with its favorite CatBoost algorithm.

   The thing is, neural translation often fails when translating short
   phrases, since it uses context to choose the right word. It would be
   hard if the word appeared very few times in a training data. In such
   cases, a simple statistical translation finds the right word quickly
   and simply.
   [0JAh8EbHC6Sk9nIVU.png]

   Yandex doesn’t share the details. It fends us off with marketing
   press-releases. OKAY.

     It looks like Google uses SMT for the translation of words and short
     phrases. They don’t mention that in any articles, but it’s quite
     noticeable if you look at the difference between the translation of
     short and long expressions. Besides, SMT is used for displaying the
     word’s stats.

The conclusion and the future

   Everyone’s still excited about the idea of “Babel fish” — instant
   speech translation. Google has made steps towards it with its Pixel
   Buds, but in fact, it’s still not what we were dreaming of. The instant
   speech translation is different from the usual translation. You need to
   know when to start translating and when to shut up and listen. I
   haven’t seen suitable approaches to solve this yet. Unless, maybe,
   Skype…

   And here’s one more empty area: all the learning is limited to the set
   of parallel text blocks. The deepest neural networks still learn at
   parallel texts. We can’t teach the neural network without providing it
   with a source. People, instead, can complement their lexicon with
   reading books or articles, even if not translating them to their native
   language.

   If people can do it, the neural network can do it too, in theory. I
   found only one prototype attempting to incite the network, which knows
   one language, to read the texts in another language in order to gain
   experience. I’d try it myself, but I’m silly. Ok, that’s it.

     This story originally was written in Russian and then translated
     into English on Vas3k.com by Vasily Zubarev. He is my pen-friend and
     I’m pretty sure that his blog should be spread.

Useful links

      Philipp Koehn: Statistical Machine Translation. Most complete
       collection of the methods I’ve found.
      Moses — popular library for creating own statistical translations
      OpenNMT — one more library, but for the neural translators
      The article from one of my favorite bloggers explaining RNN and
       LSTM
      A video “How to Make a Language Translator”, funny guy, neat
       explanation. Still not enough.
      Text guide from TensorFlow about creation of your own neural
       translator, for those who want more examples and to try the code.
     __________________________________________________________________

Others articles from Vas3k.com

   How Ethereum and Smart Contracts Work
   Distributed Turing Machine with Blockсhain Protectionvas3k.com
   Blockchain Inside Out: How Bitcoin Works
   Once and for all in simple wordsvas3k.com

One last thing…

   If you liked this article, click the👏 below, and share it with other
   people so they can enjoy it as well.
      Machine Learning
      Tech
      Technology
      Programming
      Artificial Intelligence

   
    1.96K claps
        5  

      BlockedUnblock  FollowFollowing
   Go to the profile of Ilya Pestov

Ilya Pestov

   Startup hunter, analyst, bot evangelist. Ex-CMO at Statsbot.

      Follow
   freeCodeCamp.org

freeCodeCamp.org

   Stories worth reading about programming and technology from our open
   source community.
      

</file>
<file1-41.txt>
   alternate

   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-TQH3BX

   Logo
   ____________________
   English (US) 日本語
    a request Sign in

   ____________________

    1. Memsource
    2. Managing Translations
    3. Memsource Cloud User Manual

Manage Machine Translation via Memsource

In This Article

   Memsource users can now purchase machine translation characters and
   track machine translation character usage directly in Memsource.

Purchasing Machine Translation Characters

   Currently, it is only possible to purchase characters for Microsoft
   Translator, Microsoft Translator Hub, and Microsoft Custom Translator.
   When you manage Microsoft Translator, Microsoft Translator Hub or
   Microsoft Custom Translator in Memsource, you receive 2 million free
   characters per month.

   If you select Microsoft Translator (+free characters), Microsoft
   Translator Hub (+free characters), or Microsoft Custom Translator
   (+free characters) from the list of supported MT engines, you will
   automatically create an MT engine that is managed via Memsource. This
   means you can purchase MT characters for this engine in Memsource
   without creating an account in Microsoft.

   If you have an existing Microsoft Translator or Microsoft Translator
   Hub account that you want to manage via Memsource, complete the
   following steps:
   1) On the Machine Translation Settings page, select the MT engine and
   click Edit.
   2) Select the Get free characters check box
   3) Click Save.

   To opt out of managing a MT engine via Memsource, see the main Machine
   Translation article.


   To buy more characters, select Buy Characters next to the appropriate
   MT engine on the Machine Translation Settings page.

   There are three bundles available:

   2 million characters - $20 (€17)

   5 million characters - $50 (€43)

   10 million characters - $100 (€86)

   Select a bundle and then follow the instructions on the payment pages.

   Once you have bought the characters, you will see the that the
   characters have been added to the Remaining characters column on
   the Machine Translation Settings page.

   An invoice will have been automatically generated. A link to the
   invoice will be available in the green banner that appears when the
   payment has been successful. It can also be viewed by going to
   Setup>Subscription>Details. The invoice will be called Machine
   Translation.

   There is no time limit when it comes to using up these characters. Once
   they are purchased they will remain in your account.

How Free and Paid Characters Are Used

   When using Microsoft Translator (+free characters), Microsoft
   Translator Hub (+free characters), or Microsoft Custom Translator
   (+free characters) via Memsource, your balance of free characters will
   be topped up to 2 million every month. Unused free characters are not
   carried over to the next month. If you purchase characters on top of
   your free characters, free characters are always consumed first.

   Example: You set up an MT engine and receive 2 million free characters.
   Then, you buy another 5 million characters. During the month you only
   consume 1.5 million characters. This means that next month we will give
   you another 1.5 million free characters and none of the 5 million
   characters you purchased will be consumed.

Monitoring Character Usage

   On the Machine Translation Settings page you will see a usage chart for
   the different MT engines in your account. Currently the chart can only
   display data from the past 30 days. You can view data for a specific
   engine by deselecting the names of the other engines at the bottom of
   the chart.

   Please note: By default, Project Managers will only be able to see data
   related to projects they have created. For PMs to see all data for all
   projects in an organization, an Admin user for the organization will
   need to adjust the User settings by going to Setup>User, finding the
   user and selecting View all data next to the option Home page
   Dashboards.

   [mceclip0.png]


   Also on the Machine Translation Settings page, you will see the usual
   list of engines associated with the account, but there is one extra
   column: Remaining Characters.

   With the supported MT engines, Microsoft Translator and Microsoft
   Translator Hub, you will see the number of remaining characters
   available for each of the engines. As you use the characters, this
   number will decrease. For other MT engines, the remaining characters
   will be unknown.

   [mceclip1.png]


Related articles

      Machine Translation
      Microsoft with Feedback Deprecation
      How To Get Google Translate API Key And Start Using Google NMT in
       Memsource
      Search and edit Translation Memory content
      Continuous Job

Need 

       a request
      New Community Post

Resources

      Free Trial
      Webinars
      New Features
      Release Notes

System Status

     
   Memsource
   English (US) 日本語
</file>
<file1-42.txt>
 

MT News

      AMTA 2018 | Proceedings for the Conference, Keynotes, Workshops and
       Tutorials March 21, 2018 Main Conference Research Track Download
       (2.7 MB) Commercial and Government Tracks Download (28.4 MB)
       Keynotes Arianna Bisazza - Leiden Read more

      AMTA 2018 | Workshop | The Role of Authoritative Standards in the
       MT Environment January 30, 2018 In this workshop, we will bring
       together experts from across the standards community, including
       from the American Society for Testing Read more

      AMTA 2018 | Tutorial | ModernMT: Open-Source Adaptive Neural MT for
       Enterprises and Translators January 30, 2018 Nowadays,
       computer-assisted translation (CAT) tools represent the dominant
       technology in the translation market – and those including machine
       translation (MT) Read more

      AMTA 2018 | Tutorial | MQM-DQF: A Good Marriage (Translation
       Quality for the 21st Century) January 30, 2018 In the past three
       years, the language industry has been converging on the use of the
       MQM-DQF framework for analytic Read more

      AMTA 2018 | Tutorial | A Deep Learning curve for Post-Editing
       January 30, 2018 Does post-editing also require a deep learning
       curve? How do the neural networks of post-editors work in concert
       with neural Read more

      AMTA 2018 | Tutorial | De-mystifying Neural MT January 30, 2018
       Neural Machine Translation technology is progressing at a very
       rapid pace. In the last few years, the research community has Read
       more

      AMTA 2018 | Tutorial | Getting Started Customizing MT with
       Microsoft Translator Hub: From Pilot Project to Production January
       30, 2018 Develop an Effective MT Customization Pilot Project Learn
       strategies to plan and carry out an effective pilot project to
       train Read more

      AMTA 2018 | Tutorial | Corpora Quality Management for MT –
       Practices and Roles January 17, 2018 Tutorial Presenters: Nicola
       Ueffing (eBay MT Science), Pete Smith (University of Texas
       Arlington) and Silvio Picinini (eBay Localization) Target audience:
       Read more

      AMTA 2018 | Workshop | Translation Quality Estimation and Automatic
       Post-Editing January 2, 2018 Boston, Massachusetts, March 21, 2018
       The goal of quality estimation is to evaluate a translation
       system’s quality without access to Read more

      ResearchersWhere to publish MT related research? Here is some of
       most prestigious international conferences and scientific journals
       that publish research papers related to machine translation.

      TAUS Guidelines on Post-Editing TAUS Post-Editing Guidelines
       (created in partnership with CNGL): general post-editing guidelines
       for "good enough" and "human translation level" post-editing
       pricing Read more

      DevelopersSlate from Precision Translation Tools Precision
       Translation Tools announces the release of Slate, the first
       packaged SMT toolkit for native Windows x86-64 operating systems.
       Note: Read more

      MT for Translators During the last couple of years, machine
       translation post-editing has become one of the hottest most
       discussed topics in the translation industry as evidenced by
       conferences, forums and webinars.

      MT as part of a translation service Machine translation as a
       service can be either a byproduct for some teams and companies that
       develop MT technology for above mentioned use cases, or they focus
       on MT technology development

      Government MT Users Features of machine translation (MT)
       implementations and project efforts in official settings,
       regardless of jurisdiction, are guided by at least three attributes
       common to administration of authority.

   NEW IN MACHINE TRANSLATION?
   BOOKSHELF
   BECOME A MEMBER!

      Posted in: UsersMT for Translators During the last couple of years,
       machine translation post-editing has become one of the hottest most
       discussed topics in the Read more

      MT as part of a translation service Machine translation as a
       service can be either a byproduct for some teams and companies that
       develop MT technology for Read more

      Government MT Users Features of machine translation (MT)
       implementations and project efforts in official settings,
       regardless of jurisdiction, are guided by at least Read more

    NEW IN MACHINE TRANSLATION?
    BOOKSHELF
    BECOME A MEMBER!

Quick Links


      Past Conferences
      Industrial Research Labs
      Research in Academia
      Research in Government


      Home
      Machine Translation

</file>
<file1-43.txt>

NIPS Proceedings^β

   ____________________
      Books
      2016


Conference Event Type: Poster

Abstract

   While neural machine translation (NMT) is making good progress in the
   past two years, tens of millions of bilingual sentence pairs are needed
   for its training. However, human labeling is very costly. To tackle
   this training data bottleneck, we develop a dual-learning mechanism,
   which can enable an NMT system to automatically learn from unlabeled
   data through a dual-learning game. This mechanism is inspired by the
   following observation: any machine translation task has a dual task,
   e.g., English-to-French translation (primal) versus French-to-English
   translation (dual); the primal and dual tasks can form a closed loop,
   and generate informative feedback signals to train the translation
   models, even if without the involvement of a human labeler. In the
   dual-learning mechanism, we use one agent to represent the model for
   the primal task and the other agent to represent the model for the dual
   task, then ask them to teach each other through a reinforcement
   learning process. Based on the feedback signals generated during this
   process (e.g., the language-model likelihood of the output of a model,
   and the reconstruction error of the original sentence after the primal
   and dual translations), we can iteratively update the two models until
   convergence (e.g., using the policy gradient methods). We call the
   corresponding approach to neural machine translation \emph{dual-NMT}.
   Experiments show that dual-NMT works very well on
   English$\leftrightarrow$French translation; especially, by learning
   from monolingual data (with 10\% bilingual data for warm start), it
   achieves a comparable accuracy to NMT trained from the full bilingual
   data for the French-to-English translation task.

Neural Information Processing Systems (NIPS)

   Papers published at the Neural Information Processing Systems
   Conference.

 1987 – 2019 Neural Information Processing Systems Foundation, Inc.
</file>
<file1-44.txt>
    alternate alternate alternate alternate alternate alternate alternate
   Unbabel » Feed

   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-M77VLBR

   (BUTTON)
      Solutions
       Customer Service Increase customer satisfaction, cut down response
       times, and build a more efficient operation.
       Unbabel for Zendesk Get multilingual with Zendesk Support, Chat and
       Guide
       Unbabel for Freshdesk Deliver customer support in 28 languages on
       Freshdesk
       Unbabel for Salesforce Seamless translation solutions for Service
       Cloud, Knowledge, and Live Agent
       Unbabel for Video Your one-stop shop for high quality
       transcription, translation and subtitling
      APIs
       Translation API
       Video API
      Pricing
      Blog
      Request a demo
      Become a translator

  
The world’s only human-quality
translation pipeline

   Request demo API documentation


AI + Human Translation API

   Get human-quality translations of your content piped where you need it

Continuous translation

   Unbabel can translate all your content seamlessly

50,000+ editor community

   AI assisted Human Translators around the globe translate your content

Final human touch

   Human quality is enforced by skilled professionals before delivery

AI, Glossaries and Style Guides

   AI assisted guidance ensures translation quality and speed in every
   step

   [unbabel-apipage-usp-01.svg]

State-of-the-art Translation

   Unbabel employs custom Machine Translation engines using
   state-of-the-art Neural Machine Translation (NMT) adapted to our
   customers’ domains.

   [unbabel-apipage-usp-02.svg]

50,000+ editor community

   We work with a Community of professional translators and native
   speakers. They’re on the move, around the globe, working on the Unbabel
   Platform on their computers and mobile phones.

   [unbabel-apipage-usp-03.svg]

Glossaries & style guides

   Customer glossaries and style guides assure quality and consistency
   with your brand’s voice in every translation.

   [unbabel-apipage-usp-04.svg]

The world’s best Quality Estimation system

   Unbabel has the world’s most state of the art Quality Estimation
   system, winning multiple shared tasks at the Workshop on Machine
   Translation by wide margins. We use it to rank our translations and to
   identify incorrect words for our editors to pay special attention to.

   [unbabel-apipage-usp-05.svg]

Better than out-of-the-box Google, Microsoft and Yandex

   We incorporate customer-specific training data, machine translation
   engines adapted by content type, and a host of machine learning
   algorithms to beat out-of-the-box MT solutions from some of the biggest
   names in tech.

   [unbabel-apipage-usp-06.svg]

Developers are welcome

   With a fully functional SDK for Python, and SDKs for Ruby and PHP in
   development, you can put Unbabel to work for you as quickly as
   possible. Learn more.

See it in action

   IFRAME: https://www.youtube.com/embed/zGQOLW9KJxo?rel0&showinfo0

Use Cases

   The Unbabel Translation API seamlessly integrates with your workflows,
   business processes, websites, apps, comms platforms and more.

Your platform, multilingual

   Build multilingual platform integrations, like we’ve done for
   Salesforce, Zendesk and more.

CMS translation on the go

   Translate within your CMS so you can easily publish new content in
   multiple languages.

Build and partner with Unbabel

   Develop an Unbabel Integration with another platform and list it on our
   Marketplace.

Reach customers in their native
language with Unbabel

   Request demoAPI documentation

Company

      About
      Publications
      Careers
      Press & Media
      Portugal 2020 – Project 10432
      Portugal 2020 – Project 027767

Why Unbabel

      Translation Quality
      Translation Speed
      Languages
      Customers
      Translators
      Developers

Use Cases

      Multilingual Customer Service
      Multilingual Live Chat

Support

      Customer Support
      Translator Support
      API Documentation
      Terms of Service
      Privacy Policy
      Contact

      English
          + Español
          + Português
          + Italiano
          + Deutsch
          + Français
          + 简体中文

     
   Unbabel

   Building universal understanding
</file>
<file1-45.txt>
   publisher Medium alternate

   Homepage
   Homepage
   Towards Data Science
    Follow
   Sign inGet started
      Home
      Data Science
      Machine Learning
      Programming
      Visualization
      AI
      Picks
      Contribute
     

      Home
      Data Science
      Machine Learning
      Programming
      Visualization
      AI
      Picks
      Contribute
     
     __________________________________________________________________

Neural Machine Translation with Python

   Go to the profile of Susan Li
   Susan Li  BlockedUnblock  FollowFollowing
   Jun 23, 2018
   [1MfgBaSPc2G4hExYIOqRwOQ.png]
   Photo credit: eLearning Industry

   Machine translation, sometimes referred to by the abbreviation MT is a
   very challenge task that investigates the use of software to translate
   text or speech from one language to another. Traditionally, it involves
   large statistical models developed using highly sophisticated
   linguistic knowledge.

   Here we are, we are going to use deep neural networks for the problem
   of machine translation. We will discover how to develop a neural
   machine translation model for translating English to French. Our model
   will accept English text as input and return the French translation. To
   be more precise, we will be practicing building 4 models, which are:
      A simple RNN.
      An RNN with embedding.
      A bidirectional RNN.
      An encoder-decoder model.

   Training and evaluating deep neural networks is a computationally
   intensive task. I used AWS EC2 instance to run all of the code. If you
   plan to follow along, you should have access to GPU instances.

Import the libraries

import collections
import er
import numpy as np
import project_tests as tests
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Model
from keras.layers import GRU, Input, Dense, TimeDistributed, Activation, RepeatV
ector, Bidirectional
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy

   I use .pyto load the data, and project_test.pyis for testing our
   functions.

The Data

   The dataset contains a relative small vocabulary and can be found here.
   The small_vocab_en file contains English sentences and their French
   translations in the small_vocab_fr file.

   Load the data
english_sentences  er.load_data('data/small_vocab_en')
french_sentences  er.load_data('data/small_vocab_fr')
print('Dataset Loaded')

   Dataset Loaded

   Sample sentences

   Each line in small_vocab_en contains an English sentence with the
   respective translation in each line of small_vocab_fr.
for sample_i in range(2):
    print('small_vocab_en Line {}:  {}'.format(sample_i + 1, english_sentences[s
ample_i]))
    print('small_vocab_fr Line {}:  {}'.format(sample_i + 1, french_sentences[sa
mple_i]))

   [1njzJa8HihVK7MCZbhFrybg.png]
   Figure 1

   Vocabulary

   The complexity of the problem is determined by the complexity of the
   vocabulary. A more complex vocabulary is a more complex problem. Let’s
   look at the complexity of the data set we’ll be working with.
english_words_counter  collections.Counter([word for sentence in english_senten
ces for word in sentence.split])
french_words_counter  collections.Counter([word for sentence in french_sentence
s for word in sentence.split])
print('{} English words.'.format(len([word for sentence in english_sentences for
 word in sentence.split])))
print('{} unique English words.'.format(len(english_words_counter)))
print('10 Most common words in the English dataset:')
print('"' + '" "'.join(list(zip(english_words_counter.most_common(10)))[0]) + '
"')
print
print('{} French words.'.format(len([word for sentence in french_sentences for w
ord in sentence.split])))
print('{} unique French words.'.format(len(french_words_counter)))
print('10 Most common words in the French dataset:')
print('"' + '" "'.join(list(zip(french_words_counter.most_common(10)))[0]) + '"
')

   [1E6bHmtNhRfIS6pFcwXRSLg.png]
   Figure 2

Pre-process

   We will convert the text into sequences of integers using the following
   pre-process methods:
    1. Tokenize the words into ids
    2. Add padding to make all the sequences the same length.

   Tokenize

   Turn each sentence into a sequence of words ids using Keras’s Tokenizer
   function. Use this function to tokenize english_sentences and
   french_sentences .

   The function tokenize returns tokenized input and the tokenized class.
def tokenize(x):
    x_tk  Tokenizer(char_level  False)
    x_tk.fit_on_texts(x)
    return x_tk.texts_to_sequences(x), x_tk
text_sentences  [
    'The quick brown fox jumps over the lazy dog .',
    'By Jove , my quick study of lexicography won a prize .',
    'This is a short sentence .']
text_tokenized, text_tokenizer  tokenize(text_sentences)
print(text_tokenizer.word_index)
print
for sample_i, (sent, token_sent) in enumerate(zip(text_sentences, text_tokenized
)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(sent))
    print('  Output: {}'.format(token_sent))

   [1y3AaeOdHcidca5C3QjCD4Q.png]
   Figure 3

   Padding

   Make sure all the English sequences have the same length and all the
   French sequences have the same length by adding padding to the end of
   each sequence using Keras’s pad_sequences function.
def pad(x, lengthNone):
    if length is None:
        length  max([len(sentence) for sentence in x])
    return pad_sequences(x, maxlen  length, padding  'post')
tests.test_pad(pad)
 Pad Tokenized output
test_pad  pad(text_tokenized)
for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad))
:
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(np.array(token_sent)))
    print('  Output: {}'.format(pad_sent))

   [1V7jGsBA25nPHCBWwwmXv-w.png]
   Figure 4

   Pre-process Pipeline

   Implement a pre-process function
def preprocess(x, y):
    preprocess_x, x_tk  tokenize(x)
    preprocess_y, y_tk  tokenize(y)
preprocess_x  pad(preprocess_x)
    preprocess_y  pad(preprocess_y)
 Keras's sparse_categorical_crossentropy function requires the labels to be in
3 dimensions
    preprocess_y  preprocess_y.reshape(preprocess_y.shape, 1)
return preprocess_x, preprocess_y, x_tk, y_tk
preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_t
okenizer \
    preprocess(english_sentences, french_sentences)

max_english_sequence_length  preproc_english_sentences.shape[1]
max_french_sequence_length  preproc_french_sentences.shape[1]
english_vocab_size  len(english_tokenizer.word_index)
french_vocab_size  len(french_tokenizer.word_index)
print('Data Preprocessed')
print("Max English sentence length:", max_english_sequence_length)
print("Max French sentence length:", max_french_sequence_length)
print("English vocabulary size:", english_vocab_size)
print("French vocabulary size:", french_vocab_size)

   [1LNKPKLFtAIg8riMlPiuwYA.png]
   Figure 5

Models

   In this section, we will experiment with various neural network
   architectures. We will begin by training four relatively simple
   architectures.
      Model 1 is a simple RNN
      Model 2 is a RNN with Embedding
      Model 3 is a Bidirectional RNN
      Model 4 is an Encoder-Decoder RNN

   After experimenting with the four simple architectures, we will
   construct with a deeper model that designed to outperform all four
   models.

   Ids Back to Text

   The neural network will be translating the input to words ids, which
   isn’t the final form we want. We want the French translation. The
   function logits_to_textwill bridge the gab between the logits from the
   neural network to the French translation. We will use this function to
   better understand the output of the neural network.
def logits_to_text(logits, tokenizer):
    index_to_words  {id: word for word, id in tokenizer.word_index.items}
    index_to_words[0]  
return ' '.join([index_to_words[prediction] for prediction in np.argmax(logits,
1)])
print('`logits_to_text` function loaded.')

   `logits_to_text` function loaded.

   Model 1: RNN
   [1x1R8CyV3pTPOsjXvSOO5sg.png]
   Figure 6

   We are creating a basic RNN model which is a good baseline for sequence
   data that translate English to French.
def simple_model(input_shape, output_sequence_length, english_vocab_size, french
_vocab_size):
    learning_rate  1e-3
    input_seq  Input(input_shape[1:])
    rnn  GRU(64, return_sequences  True)(input_seq)
    logits  TimeDistributed(Dense(french_vocab_size))(rnn)
    model  Model(input_seq, Activation('softmax')(logits))
    model.compile(loss  sparse_categorical_crossentropy,
                 optimizer  Adam(learning_rate),
                 metrics  ['accuracy'])

    return model
tests.test_simple_model(simple_model)
tmp_x  pad(preproc_english_sentences, max_french_sequence_length)
tmp_x  tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))
 Train the neural network
simple_rnn_model  simple_model(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size)
simple_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs10
, validation_split0.2)
 Print prediction(s)
print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

   [15ETjq8QPsMiIkD9OPNRCYA.png]
   Figure 7

   The basic RNN model’s validation accuracy ends at 0.6039.

   Model 2: Embedding
   [1q-grgm0n_P3blVLpVQOKbg.png]
   Figure 8

   An embedding is a vector representation of the word that is close to
   similar words in n-dimensional space, where the n represents the size
   of the embedding vectors. We will create a RNN model using embedding.
from keras.models import Sequential
def embed_model(input_shape, output_sequence_length, english_vocab_size, french_
vocab_size):
    learning_rate  1e-3
    rnn  GRU(64, return_sequencesTrue, activation"tanh")

    embedding  Embedding(french_vocab_size, 64, input_lengthinput_shape[1])
    logits  TimeDistributed(Dense(french_vocab_size, activation"softmax"))

    model  Sequential
    em can only be used in first layer --> Keras Documentation
    model.add(embedding)
    model.add(rnn)
    model.add(logits)
    model.compile(losssparse_categorical_crossentropy,
                  optimizerAdam(learning_rate),
                  metrics['accuracy'])

    return model
tests.test_embed_model(embed_model)
tmp_x  pad(preproc_english_sentences, max_french_sequence_length)
tmp_x  tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))
embeded_model  embed_model(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size)
embeded_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs10, v
alidation_split0.2)
print(logits_to_text(embeded_model.predict(tmp_x[:1])[0], french_tokenizer))

   [1Wscri-RL8VSyX9F0EU8C8g.png]
   Figure 9

   The embedding model’s validation accuracy ends at 0.8401.

   Model 3: Bidirectional RNNs
   [11HqR8be4idB0AefyDSRZAQ.png]
   Figure 10
def bd_model(input_shape, output_sequence_length, english_vocab_size, french_voc
ab_size):

    learning_rate  1e-3
    model  Sequential
    model.add(Bidirectional(GRU(128, return_sequences  True, dropout  0.1),
                           input_shape  input_shape[1:]))
    model.add(TimeDistributed(Dense(french_vocab_size, activation  'softmax')))
    model.compile(loss  sparse_categorical_crossentropy,
                 optimizer  Adam(learning_rate),
                 metrics  ['accuracy'])
    return model
tests.test_bd_model(bd_model)
tmp_x  pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x  tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))
bidi_model  bd_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)
bidi_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs20, vali
dation_split0.2)
 Print prediction(s)
print(logits_to_text(bidi_model.predict(tmp_x[:1])[0], french_tokenizer))

   [1vLiPKHB3plyezkcDfCccBw.png]
   Figure 11

   The Bidirectional RNN model’s validation accuracy ends at 0.5992.

   Model 4: Encoder-Decoder

   The encoder creates a matrix representation of the sentence. The
   decoder takes this matrix as input and predicts the translation as
   output.
def encdec_model(input_shape, output_sequence_length, english_vocab_size, french
_vocab_size):

    learning_rate  1e-3
    model  Sequential
    model.add(GRU(128, input_shape  input_shape[1:], return_sequences  False))
    model.add(RepeatVector(output_sequence_length))
    model.add(GRU(128, return_sequences  True))
    model.add(TimeDistributed(Dense(french_vocab_size, activation  'softmax')))

    model.compile(loss  sparse_categorical_crossentropy,
                 optimizer  Adam(learning_rate),
                 metrics  ['accuracy'])
    return model
tests.test_encdec_model(encdec_model)
tmp_x  pad(preproc_english_sentences)
tmp_x  tmp_x.reshape((-1, preproc_english_sentences.shape[1], 1))
encodeco_model  encdec_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)
encodeco_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs20,
validation_split0.2)
print(logits_to_text(encodeco_model.predict(tmp_x[:1])[0], french_tokenizer))

   [15ea_Unpt0IdP0nt95278rw.png]
   Figure 12

   The Encoder-decoder model’s validation accuracy ends at 0.6406.

   Model 5: Custom

   Create a model_final that incorporates embedding and a bidirectional
   RNN into one model.

   At this stage, we need to do some experiments such as changing GPU
   parameter to 256, changing learning rate to 0.005, training our model
   for more (or less than) 20 epochs etc.
def model_final(input_shape, output_sequence_length, english_vocab_size, french_
vocab_size):

    model  Sequential
    model.add(Embedding(input_dimenglish_vocab_size,output_dim128,input_length
input_shape[1]))
    model.add(Bidirectional(GRU(256,return_sequencesFalse)))
    model.add(RepeatVector(output_sequence_length))
    model.add(Bidirectional(GRU(256,return_sequencesTrue)))
    model.add(TimeDistributed(Dense(french_vocab_size,activation'softmax')))
    learning_rate  0.005

    model.compile(loss  sparse_categorical_crossentropy,
                 optimizer  Adam(learning_rate),
                 metrics  ['accuracy'])

    return model
tests.test_model_final(model_final)
print('Final Model Loaded')

   Final Model Loaded

Prediction

def final_predictions(x, y, x_tk, y_tk):
    tmp_X  pad(preproc_english_sentences)
    model  model_final(tmp_X.shape,
                        preproc_french_sentences.shape[1],
                        len(english_tokenizer.word_index)+1,
                        len(french_tokenizer.word_index)+1)

    model.fit(tmp_X, preproc_french_sentences, batch_size  1024, epochs  17, v
alidation_split  0.2)

    y_id_to_word  {value: key for key, value in y_tk.word_index.items}
    y_id_to_word[0]  
sentence  'he saw a old yellow truck'
    sentence  [x_tk.word_index[word] for word in sentence.split]
    sentence  pad_sequences([sentence], maxlenx.shape[-1], padding'post')
    sentences  np.array([sentence[0], x[0]])
    predictions  model.predict(sentences, len(sentences))
print('Sample 1:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[0]]))
    print('Il a vuvieux camion jaune')
    print('Sample 2:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[1]]))
    print(' '.join([y_id_to_word[np.max(x)] for x in y[0]]))
final_predictions(preproc_english_sentences, preproc_french_sentences, english_t
okenizer, french_tokenizer)

   [1nPcxoagKHWGdlTMHBH-k5g.png]
   Figure 13

   We are getting perfect translations on both sentences and 0.9776
   validation accuracy score!

   Source code can be found at Github. I look forward to hearing feedback
   or questions.
      Machine Learning
      NLP
      Deep Learning
      Neural Networks
      Machine Translation

   
    225 claps
        6  

      BlockedUnblock  FollowFollowing
   Go to the profile of Susan Li

Susan Li

   Becoming an expert in ML, NLP, data story telling and encouraging
   others to do the same. Sr Data Scientist, Toronto Canada.
   https://www.linkedin.com/in/susanli/

      Follow
   Towards Data Science

Towards Data Science

   Sharing concepts, ideas, and codes.
      
        225
      
      
   Towards Data Science
   Never miss a story from Towards Data Science, when you sign up for
   Medium. Learn more
   Never miss a story from Towards Data Science
    Get updatesGet updates
</file>
<file1-46.txt>

</file>
<file1-47.txt>
   IFRAME: //www.googletagmanager.com/ns.html?idGTM-TRBQMN

   MIT Technology Review

   Hello,

   We noticed you're browsing in private or incognito mode.

   To continue reading this article, please exit incognito mode or log in.

   Not an Insider? Subscribe now for unlimited access to online articles.
   Subscribe today

Why we made this change

   Visitors are allowed 3 free articles per month (without a
   subscription), and private browsing prevents us from counting how many
   stories you've read. We hope you understand, and consider subscribing
   for unlimited online access.
   Back to MIT Technology Review home
   Contact customer service if you are seeing this message in error.

   MIT Technology Review  Menu
    

Business Impact

Human translators are still on top—for now

Machine translation works well for sentences but turns out to falter at the
document level, computational linguists have found.

      by Emerging Technology from the arXiv
      September 5, 2018

     
   You may have missed the popping of champagne corks and the shower of
   ticker tape, but in recent months computational linguists have begun to
   claim that neural machine translation now matches the performance of
   human translators.

Recommended for You

    1. IBM has just unveiled this cool-looking quantum computer—but will
       hide it in the cloud
    2. Hackers may have just stolen $1 million from the Ethereum Classic
       blockchain in a “51%” attack
    3. The government shutdown has severely weakened cybersecurity in the
       US
    4. The US and China are in a quantum arms race that will transform
       warfare
    5. Data mining adds evidence that war is baked into the structure of
       society

   The technique of using a neural network to translate text from one
   language into another has improved by leaps and bounds in recent years,
   thanks to the ongoing breakthroughs in machine learning and artificial
   intelligence. So it is not really a surprise that machines
   have approached the performance of humans. Indeed, computational
   linguists have good evidence to back up this claim.

   But today, Samuel Laubli at the University of Zurich and a couple of
   colleagues say the champagne should go back on ice. They do not dispute
   their colleagues’ results but say the testing protocol fails to take
   account of the way humans read entire documents. When this is assessed,
   machines lag significantly behind humans, they say.
   [machine-translation.png?sw600&amp;cx0&amp;cy0&amp;cw1420&amp;ch53
   5]

   At issue is how machine translation should be evaluated. This is
   currently done on two measures: adequacy and fluency. The adequacy of a
   translation is determined by professional human translators who read
   both the original text and the translation to see how well it expresses
   the meaning of the source. Fluency is judged by monolingual readers who
   see only the translation and determine how well it is expressed in
   English.

   Computational linguists agree that this system gives useful ratings.
   But according to Laubli and co, the current protocol only compares
   translations at the sentence level, whereas humans also evaluate text
   at the document level.

   So they have developed a new protocol to compare the performance of
   machine and human translators at the document level. They asked
   professional translators to assess how well machines and humans
   translated over 100 news articles written in Chinese into English. The
   examiners rated each translation for adequacy and fluency at the
   sentence level but, crucially also at the level of the entire document.

   The results make for interesting reading. To start with, Laubli and co
   found no significance difference in the way professional translators
   rated the adequacy of machine- and human-translated sentences. By this
   measure, humans and machines are equally good translators, which is in
   line with previous findings.

   However, when it comes to evaluating the entire document, human
   translations are rated as more adequate and more fluent than machine
   translations. “Human raters assessing adequacy and fluency show a
   stronger preference for human over machine translation when evaluating
   documents as compared to isolated sentences,” they say.

   The researchers think they know why. “We hypothesise that
   document-level evaluation unveils errors such as mistranslation of an
   ambiguous word, or errors related to textual cohesion and coherence,
   which remain hard or impossible to spot in a sentence-level
   evaluation,” they say.

   For example, the team gives the example of a new app called “微信挪 车,”
   which humans consistently translate as “WeChat Move the Car” but which
   machines often translate in several different ways in the same article.
   Machines translate this phrase as “ Move Car,” “WeChat mobile,”
   and “WeChat Move.” This kind of inconsistency, say Laubli and co, makes
   documents harder to follow.

   This suggests that the way machine translation is evaluated needs to
   evolve away from a system where machines consider each sentence in
   isolation.

   “As machine translation quality improves, translations will become
   harder to discriminate in terms of quality, and it may be time to shift
   towards document-level evaluation, which gives raters more context to
   understand the original text and its translation, and also exposes
   translation errors related to discourse phenomena which remain
   invisible in a sentence-level evaluation,” say Laubli and co.

   That change should  machine translation improve. Which means it is
   still set to surpass human translation—just not yet.

   Ref: arxiv.org/abs/1808.07048 : Has Machine Translation Achieved Human
   Parity? A Case for Document-level Evaluation

   Learn from the humans leading the way in emerging technology at EmTech
   Next.  Today!
   June 11-12, 2019
   Cambridge, MA
    now
   
   
Share

     
Tagged

   emerging technology, arXiv
   Emerging Technology from the arXiv

   Emerging Technology from the arXiv

   Emerging Technology from the arXiv covers the latest ideas and
   technologies that appear on the Physics arXiv preprint server. It is
   part of the Physics arXiv Blog. Email:… More KentuckyFC@arxivblog.com
   Subscribe to the Physics arXiv Blog RSS Feed.

  Related Video

   More videos

   [Xb-logo-circle.png?sw75]

   Business Impact
   Finding the balance of human intelligence and artificial intelligence
   00:53
   [Xb-logo-circle.png?sw75]

   Business Impact
   How does the customer experience change when you're in a world of
   conversation? 00:39
   [Xb-logo-circle.png?sw75]

   Business Impact
   Trump's Deputy CTO on immigrant workers 02:27
   [Xb-logo-circle.png?sw75]

   Business Impact
   A View from the White House 23:50

Recommended for You

    1. IBM has just unveiled this cool-looking quantum computer—but will
       hide it in the cloud
    2. Hackers may have just stolen $1 million from the Ethereum Classic
       blockchain in a “51%” attack
    3. The government shutdown has severely weakened cybersecurity in the
       US
    4. The US and China are in a quantum arms race that will transform
       warfare
    5. Data mining adds evidence that war is baked into the structure of
       society

   More from Business Impact

   How technology advances are changing the economy and providing new
   opportunities in many industries.

      The state of artificial intelligence
       AI technologies are coming into mainstream business usage—but a
       host of challenges remains. An interactive infographic illustrates
       the opportunities and the hurdles.
       by MIT Technology Review Insights
      Israel’s “startup nation” is under threat from the tech giants that
       nurtured it
       Global companies trying to tap into Tel Aviv’s unique innovation
       ecosystem are threatening to destroy the very thing they came for.
       by Matthew Kalman
      From innovation to monetization: The economics of data-driven
       transformation
       Used strategically, an organization’s data gains value over time.
       But to unlock its potential requires first establishing the right
       technical and cultural foundations.
       by MIT Technology Review Insights

   More from Business Impact

   From Our Advertisers
      In association with VMware
       Digital transformation sparks innovation in networking
      In association with Google
       For data-savvy marketers, there’s a new keyword: Intent
      In association with Google
       Machine learning teaches marketers to cultivate a growth mindset
      In association with Oracle and Intel
       Machine learning - driven analytics: Key to digital transformation

   Want more award-winning journalism? Subscribe to Insider Online Only.
      Insider Online Only {! insider.prices.online !}
       {! insider.display.menuOptionsLabel !}
       Unlimited online access including articles and video, plus The
       Download with the top tech stories delivered daily to your inbox.
       {! insider.buttons.online.buttonText !}
       See details+
       Unlimited online access including all articles, multimedia, and
       more
       The Download newsletter with top tech stories delivered daily to
       your inbox

    {! insider.display.footerLabel !}

   See international prices

   See U.S. prices

   Revert to MIT Enterprise Forum pricing

   Revert to standard pricing

   Clocking In A look into how technology is shaping the workplace of the
   future

   By signing up you agree to receive email newsletters and notifications
   from MIT Technology Review. You can change your preferences at any
   time. View our Privacy Policy for more detail.

   Follow us
     RSS

   MIT Technology Review

   The mission of MIT Technology Review is to bring about better-informed
   and more conscious decisions about technology through authoritative,
   influential, and trustworthy journalism.

  
   /3
   You've read of three free articles this month. Subscribe now for
   unlimited online access. You've read of three free articles this month.
   Subscribe now for unlimited online access. This is your last free
   article this month. Subscribe now for unlimited online access. You've
   read all your free articles this month. Subscribe now for unlimited
   online access. You've read of three free articles this month. Log in
   for more, or subscribe now for unlimited online access. Log in for two
   more free articles, or subscribe now for unlimited online access.
</file>
<file1-48.txt>
   next Iconic Translation Machines Ltd. » Feed Iconic Translation
   Machines Ltd. » Comments Feed

  
   0%
   E-discovery Translation Enterprise Machine Translation Patent
   Translation Solutions

E-discovery Translation

   Iconic solutions for e-discovery provide better quality, secure,
   robust, real-time translation tailored for your legal case allows you
   to streamline multilingual document review of ESI making the process
   faster, more effective, and more cost efficient.
   Learn More

Enterprise Machine Translation

   Iconic delivers high-quality customised machine translation solutions
   for Enterprise users, adapted to your language, content, and style by
   our team of linguistic experts. It's MT with subject matter expertise.
   Get Started

Patent Translation Solutions

   We created the world's first patent-specific machine translation
   engines. Our IPTranslator technology offers best-in-class machine
   translation performance for the translation of patent and related
   documents.
   Get Started

The Enterprise Machine Translation System

    We develop Neural Machine Translation with Subject Matter Expertise,
   to  the world's largest corporations, service providers, and
   government organisations to adopt specialist AI-powered translation
   solutions of superior quality, tailored with expertise.

Neural Machine Translation

   Our proprietary Ensemble Architecture enables superior MT engines with
   a mix of neural, statistics, rules, and linguistic engineering
   techniques adapted to suit each content type and language.

Enterprise Solutions

   We provide leading Enterprise MT solutions, developed by our expert
   team of MT PhDs and specialist engineers. We constantly innovate to
   deliver cutting edge MT software solutions and ensure that your quality
   requirements are exceeded.

E-discovery Translation

   Our e-discovery translation software empowers you to search for and
   find the most relevant documents in your multilingual content, in your
   language at a moment’s notice. Translate vast amounts of
   foreign-language ESI quickly, securely, and effectively. Are you ready?

Case Studies

   Our company is committed to providing reliable solutions in the
   long run. Read about our clients who have successfully adopted MT in
   their business and the benefits they saw to the bottom line.

"Iconic delivered measurable productivity gains from the outset. Rarely have
we seen the complexities and unforeseen but inevitable surprises of MT
integration in large scale production processes handled as competently and
efficiently.”

   Weloc_t

     
</file>
<file1-49.txt>
   next Iconic Translation Machines Ltd. » Feed Iconic Translation
   Machines Ltd. » Comments Feed

   [Iconic-Logo_rgb-257x110.png]
   Menu
      What We Do
          + Custom Solutions
          + Neural Machine Translation
      

   0%
   E-discovery Translation Enterprise Machine Translation Patent
   Translation Solutions

E-discovery Translation

   Iconic solutions for e-discovery provide better quality, secure,
   robust, real-time translation tailored for your legal case allows you
   to streamline multilingual document review of ESI making the process
   faster, more effective, and more cost efficient.
   Learn More

Enterprise Machine Translation

   Iconic delivers high-quality customised machine translation solutions
   for Enterprise users, adapted to your language, content, and style by
   our team of linguistic experts. It's MT with subject matter expertise.
   Get Started

Patent Translation Solutions

   We created the world's first patent-specific machine translation
   engines. Our IPTranslator technology offers best-in-class machine
   translation performance for the translation of patent and related
   documents.
   Get Started

The Enterprise Machine Translation System

    We develop Neural Machine Translation with Subject Matter Expertise,
   to  the world's largest corporations, service providers, and
   government organisations to adopt specialist AI-powered translation
   solutions of superior quality, tailored with expertise.

Neural Machine Translation

   Our proprietary Ensemble Architecture enables superior MT engines with
   a mix of neural, statistics, rules, and linguistic engineering
   techniques adapted to suit each content type and language.

Enterprise Solutions

   We provide leading Enterprise MT solutions, developed by our expert
   team of MT PhDs and specialist engineers. We constantly innovate to
   deliver cutting edge MT software solutions and ensure that your quality
   requirements are exceeded.

E-discovery Translation

   Our e-discovery translation software empowers you to search for and
   find the most relevant documents in your multilingual content, in your
   language at a moment’s notice. Translate vast amounts of
   foreign-language ESI quickly, securely, and effectively. Are you ready?

Case Studies

   Our company is committed to providing reliable solutions in the
   long run. Read about our clients who have successfully adopted MT in
   their business and the benefits they saw to the bottom line.

"Iconic delivered measurable productivity gains from the outset. Rarely have
we seen the complexities and unforeseen but inevitable surprises of MT
integration in large scale production processes handled as competently and
efficiently.”

   Weloc_t

     
</file>
<file1-5.txt>

   Co-authors: Angelika Clayton and Bing Zhao

   The need for economic opportunity is global, and that is represented by
   the fact that more than half of LinkedIn’s active members live outside
   of the U.S. Engagement across language barriers and borders comes with
   a certain set of challenges—one of which is providing a way for members
   to communicate in their native language. In fact, translation of member
   posts has been one of our most requested features, and now it's finally
   here.

   Dynamic (immediate) translations in the feed has been a tiger team
   effort from the get-go: a team of passionate localization evangelists
   and hungry engineers took on the challenge of realizing an opportunity
   that relied heavily on collaboration across different teams. We began
   with a small prototype to prove and test a concept, and ramped to a
   very small section of our membership. As the concept was proven
   successful, we used that experience to develop a more scalable
   solution to incorporate more languages. There are three central
   components that we had to incorporate: language detection, machine
   translation (MT), and feed experience.
      seetranslation1

Language detection and tagging

   We separated the processes of content language detection and actual
   translation to improve the member experience with international content
   in the feed. Separating the content language detection step from
   translation allowed us to build a base for a flexible, efficient
   dynamic language translation, to expand support for various content
   types, and to generate data for the use of relevance and
   analytics teams.

   Language detection is a near-real-time application processing high
   volumes of member-generated content data distributed across
   multiple Espresso stores. Instead of consuming directly from databases,
   we needed access to all the database changes without impacting the
   online queries. For this reason, we chose Brooklin, used at LinkedIn as
   a change data capture service, to stream change events from Espresso.
   Our language detection application consumes the change stream
   containing events for each write performed on the content databases.
      seetranslation2

   To improve language detection quality, the data extracted by Samza jobs
   goes through filtering and cleansing (for example, mentions and
   hashtags are excluded from the language detection process).

   Filtered data is forwarded via the LinkedIn GaaP Service
   (Gateway-as-a-Service) to the Microsoft Text Analytics
   API, an Azure Cognitive Service that can detect up to 120
   languages. The data is tagged with language detection results, i.e.,
   locale ID and confidence score, and is available for processing by
   other applications. 

   In the content language detection and tagging process, we utilize
   multiple open source frameworks, services, and tools originally
   developed by LinkedIn, such as Kafka, Samza, and Rest.li.

Feed experience

   The initial small-scale prototype on short-form member posts involved
   the implementation of a “See Translation” button whenever the language
   of the post, detected through a separate network call to the Microsoft
   Translator API (another Azure Cognitive Service), did not match the
   member’s interface language. When clicked, the button would display the
   text translated into the member’s interface language. The prototype was
   a proof of concept for internal ramping and a very limited external
   ramp, as a learning and evaluation exercise.

   The prototype was very successful in that member feedback was positive
   both in terms of the value of the feature itself and of the quality of
   the translated content. The prototype also allowed us to identify
   several areas that needed to be improved before we ramped to all
   members and all feed content:
      Locale detection: When the prototype was released, our service was
       making dual calls to Microsoft, one for language detection and one
       for translation, which was fine for a prototype, but too slow to
       scale the experience. It also meant that we did not retain the
       locale of unique content for statistical analysis.
      Locale comparison: This is a new logic that did not exist in the
       prototype. Now, we take the inferred locale set asynchronously
       by language detection and compare it with the member's interface
       locale. We no longer need to request this from Microsoft, as we
       were doing for the prototype, which significantly reduces the
       number of calls made. We now only render the “See
       translation” button if those locales are different, which makes for
       a much more intuitive member experience.
      Other content types: The prototype only worked on original posts,
       and the new model renders the functionality also on root shares,
       viral shares, and re-shares of organic updates.

   Our current design is split into two main flows: Translation Render and
   Translation Trigger.

   Translation Render flow:
      seetranslation3

   Translation Trigger flow:
      seetranslation4

Polyglot-Online

   The Polyglot-Online mid-tier service uses GaaP to safely send encrypted
   text snippets to the Translator Text. An additional advantage in this
   framework is the ability to customize the translation models for a
   specific domain (like our feed) and integrate logic for filtering
   translation outputs based on system confidence scores. The API supports
   more than 60 languages in any translation direction, all of which we
   can leverage once the source language locale of a piece of content has
   been detected. For this feed feature, we selectively translate source
   text into 24 target languages, to match each member's interface locale
   supported by LinkedIn.

   This translation service also has features like logic for protecting
   entities such as hashtags and name mentions from being distorted in
   translation, and integrated filters to block irrelevant or
   unprofessional content, as well as advertisements, from being
   translated on the LinkedIn platform. We also use an in-memory encrypted
   cache to reduce latency, with its lightweight maintenance nature and
   better cost-to-serve than centralized solutions the Java Play framework
   at LinkedIn, the service easily supported multiple thousands of QPS
   during our prototype ramp.

Acknowledgements

   Many thanks to Weizhi (Sam) Meng and Chang Liu for great coding and
   ownership, to David Snider for initiating the project, and to Annie
   Lin for writing GaaP scripts.

   We also want to thank Ian Fox for his work with Azure, Pradeepta
   Dash for engineering support for the feed, Atul Purohit for guidance
   with the feed API implementation, Jeremy Kao for guidance with
   web, Samish Kolli for client-side support, Nathan Hibner for his many
   contributions in tweaking the model, and Chao Zhang for the expert
   answers about overall backend functionality.

   Additionally, we want to recognize our ful friends at
   Microsoft: Ashish Makadia, Assaf Israel, and Brian Smith from the Text
   Analytics team, and Chris Wendt and Arul Menezes from the Translator
   team.

   Finally, a huge thank you to Francis Tsang and Tetyana Bruevich for
   their endless support.

   We hope our members enjoy this new feature!

Topics

      content,
      machine translation

      Related story


</file>
<file1-50.txt>
   RSS

                          [logo-nmt-500-alpha.png]

   An open source neural machine translation system.
  

Home

   OpenNMT is an open source (MIT) initiative for neural machine
   translation and neural sequence modeling.

                              [simple-attn.png]

   Since its launch in December 2016, OpenNMT has become a collection of
   implementations targeting both academia and industry. The systems are
   designed to be simple to use and easy to extend, while maintaining
   efficiency and state-of-the-art accuracy.

   OpenNMT has currently 3 main implementations:
      OpenNMT-lua (a.k.a. OpenNMT): the original project developed with
       LuaTorch.
       Full-featured, optimized, and stable code ready for quick
       experiments and production.
      OpenNMT-py: an OpenNMT-lua clone using the more modern PyTorch.
       Initially created by the  AI research team as an example,
       this implementation is easier to extend and particularly suited for
       research.
      OpenNMT-tf: a TensorFlow alternative.
       The more recent project focusing on large scale experiments and
       high performance model serving using the latest TensorFlow
       features.

   All versions are currently maintained.

   Common features include:
      Simple general-purpose interface, requiring only source/target
       files.
      Highly configurable models and training procedures.
      Recent research features to improve system performance.
      Extensions to allow other sequence generation tasks such as
       summarization, image-to-text, or speech-recognition.
      Active community welcoming both academic and industrial requests
       and contributions.
</file>
<file1-6.txt>

</file>
<file1-7.txt>
   alternate alternate alternate alternate

      Students

      Appelpropositions: Séminaire interdisciplinaire CUSO
      Mémoires de traduction, gestion de projet, assurance qualité
      New Ba Offer : Online Courses for Arabic-French-English Pairs
      Why study at the FTI in Geneva

       online at the FTI
      Une collaboration fructueuse pourdeux parties
      Issue 30(2) of Parallèles
      Check out the FTI alumni map

      Archives

      Introduction to the FTI
      Structure of the Faculty
      Departments and Units
      Academic and Administrative Staff

      Student Associations
      Computer and Audiovisual Resources
      Employment opportunities
      Library

      Contacts

      FTI programmes
      BA
      MA Translation
      MA Interpretation

      MA in Multilingual Communication Technology (MATIM)
      Complementary Certificate
      Doctorate
      Continuing Education

      Research at the FTI
      Conferences and Lectures
      Publications
      Journal Parallèles

      PhD Theses
      PhD and career planning

      FTI’s international relations
      INcoming students
      OUTgoing students
      Contacts

      Admission to the FTI
      Why study at the FTI
      Academic Advisors
      Enrolling at the Faculty

      Entrance Examinations
      Contacts
      FAQs

Departments and Units

Navigation

      The Department of Translation
          + Staff
          + Course catalogue
          + Research
          + Arabic Unit
          + English Unit
          + French Unit
          + German Unit
          + Italian Unit
          + Spanish Unit
      The Department TIM
          + Members
          + Course Catalogue
          + Research
          + Projects
          + Publications
          + Contact
      The Interpreting Department
          + Staff
          + Education
          + Research
          + Resources
          + Outreach
          + Virtual Institute
          + TR@IN
          + LabTalk

                                Sabrina Girletti
   Sabrina GIRLETTI
   Doctoral Assistant
   Phone: +41 22 37 98685
   Office: 6339 - Uni Mail
   Sabrina.Girletti(at)unige.ch
   Teaching Activities
   Localisation
   Traduction automatique I
   Research Interests
   Localisation
   Machine Translation
   Post-editing (MT)
   CAT tools
     __________________________________________________________________

   Summary

   Sabrina Girletti is a research and teaching assistant at the
   Translation Technology Department of the Faculty of Translation and
   Interpreting (FTI), where she contributes to postgraduate courses in
   machine translation and localisation.

   Her research interests include post-editing approaches and human
   factors in machine translation. She is currently involved in a project
   testing the implementation of machine translation at Swiss Post.
   Sabrina holds a master’s degree in Translation, with specialisation in
   Translation Technologies, from the University of Geneva, and a
   bachelor’s degree in Linguistic and cultural mediation from the
   University of Naples "L’Orientale".
     __________________________________________________________________

   Publications

Faculty of Translation and Interpreting (FTI)

      40, boulevard du Pont-d'Arve
       1211 GENEVE 4 - SUISSE
      Directions
      Contacts

Enrolling at the Faculty

      Why study at the FTI in Geneva
      Academic Advisors
      How to Apply for Admission

Quick Links

      Departments and Units
      Employment opportunities
      Library
      Faculty Intranet

Follow us on social media

     
</file>
<file1-8.txt>
   alternate alternate

   IFRAME: https://www.googletagmanager.com/ns.html?idGTM-N9TJP3Q

  
   Home > Optimized Technology > Machine Translation (MT)

Machine Translation (MT)

Leverage a tailor-made machine translation engine based on your company’s
unique data

   Human machine translation

Customized Engines, All Private

   Rather than using error-prone free machine translation services, Venga
   leverages commercial machine translation (MT) engines. After factoring
   in your insights, we customize everything to reflect your company’s
   precise localization and budgetary needs. The end result is a private
   MT engine which is totally in tune with your content.

Built in Days, Not Months

   In a business world where time is always of the essence, MT-handled
   localization projects can measurably cut costs and save time. This is
   especially important if rapid turnarounds are essential for your
   company’s success. To ensure speedier access to major overseas markets,
   our engineers can build you a private customized MT engine in just
   days.

Light or Heavy Post-Editing

   With sufficient preparation and customization, our MT engines yield
   context-correct translations requiring only minimal post-editing.
   Depending on your needs, our language specialists can then conduct
   either light or heavy post-editing to ensure all translated documents
   are of consistently high quality.

Data Refining Process

   Seamlessly integrating your glossaries and translation memories into
   your company’s private MT engine, our team will  you to measurably
   improve content quality. By making it easier to identify issues and
   correct errors in your translation assets, they will also further
   enhance your MT engine’s accuracy and quality.

Machine Translation Analytics

   Our customized MT engine will also provide your project with
   translation statistics documenting the percentages you have leveraged
   from your previously approved translated content as compared to human
   translations of the same text. In other words, our analytics will
   detail the evolution of your company’s MT engine.

For more information, download our Machine Translation Service Description:

   Machine translation service description cover image

   Download now!

Optimization Technology

      Optimized Technology
      Venga Gateway (TMS)
      Gateway Connect (CMS & API Integrations)
      WebToGlobal
      InView In-Context Tools
      Translation Assets
      Machine Translation (MT)

   [Machine Translation (MT)................]
   Plan your translation project eBook download

   Venga Locations

Who we are

   Originated in the software industry, we use our twenty plus years’
   experience globalizing information-based technology products to 
   our clients succeed internationally. Venga offers translation,
   localization, and global creative services to enable clients in any
   industry to reach new markets faster.


</file>
<file1-9.txt>
 

      › HOME
      › Machine Translation

Lucy LT - The Machine Translation Solution

Some Lucy LT customers

Lucy LT – For secure, cost-effective international communication

   If you want to
      communicate internationally in multiple languages
      increase revenue by reaching audiences in additional languages
      cut translation costs
      reduce translation turnaround times


   then check out what Lucy LT has to offer.

   Lucy LT is already ing these customers to communicate more
   efficiently in international markets.

   What can we do for you?

Your benefits

      Lucy LT is secure
      Lucy LT supports multiple text formats
      Lucy LT offers a great number of language combinations
      Lucy LT integrates with TM systems such as SDL Trados
      Lucy LT can be embedded in end-to-end documentation processes with
       editors such as Adobe InDesign.
      Lucy LT is modular, adaptable and scalable
      Lucy LT is fast
      Lucy LT is efficient (small hardware footprint)

   Do you need a real-time translation for gisting purposes? Check out our
   online MT system KWIK Translator.

Contact Information

   Lucy Software and Services GmbH
   Neidensteiner Str. 2
   D-74915 Waibstadt
   Tel. +49 7263-40930-0
   » info@lucysoftware.com
   » Full contact information

Free online translator

   [Lucy-KWIK-Translator_web1.png]

Who we are

      History and Mission
      Our People
      Our Partners
      Awards and Certification
      Jobs

What we do

News / Events

Customers

      What Our Customers Say

SAP Translation

      Our Services
      Consulting
      Full Service Translation
      Training
      Support
      Software Solutions
      Development
      Interpreting
      Intercultural Coaching
      Tips & Tricks
      Our Technical Experience
      Languages

Machine Translation

      Key Features
      Languages
      Document Formats
      Integration Capabilities
      System Requirements
      Data Security
      Our Services
      Solutions & Use Cases
      KWIK Translator

Documentation Services

General

      Home
      Legal Notice
      Data Protection
      Contact

Certification

   [logo_sap_partner.png]

   Language Consulting Partner
   Translation Partner

Memberships

   [logo_dsag.png] [logo_gala.png] [logo_tekom.png] [logo_etug.png]
   [logo_elia.png] [logo_eamt.png]

GDPR

   United Language Group is committed to protecting your personal data and
   updating our privacy policies in accordance with the European Union’s
   General Data Protection Regulation (GDPR). We use cookies to analyze
   our website traffic to provide a better user experience.
   Accept
   Full Privacy Policy
   X
</file>